No description has been provided for this image

Project Nature and Content - Computer Vision:

Sample codes are in this github address: https://github.com/KartikNW/MSDS_458_Public.

Github 101: GitHub101.pdf https://canvas.northwestern.edu/courses/221322/files/19844794/download?wrap=1

Think of the first assignment as serving multiple purposes: (1) exploring neural nets/seeing how they work on a very simple problem, (2) examining alternative neural net structures with a simple, single-hidden layer network, and (3) learning how to fit a neural network directly in Python (or Scikit Learn, TensorFlow, or Keras). This first assignment gives you a choice as to which of these objectives to emphasize.

Bottom line. You may choose the vision data set that you will be looking at, assuming that it is a simple alphabetic or numeric data classification problem. And you may choose the Python coding framework that you use to build the neural net. The neural network should be a fully connected (dense) neural network with a single hidden layer.

This first assignment deals with neural networks for classification of images. The structure of the network should be simple, with only one internal/hidden layer. The intent of the assignment is to give you hands-on, practical experience with not only designing, training, and assessing a neural network, and interpreting the impact of hyperparameters, but to go one step further.

Regarding exploration, the goal is to understand how the neurons/nodes in a simple single-hidden layer network have learned to represent features within the input data.

Regarding the management problem for this assignment. Suppose you are asked to develop a neural network model for digit classification. How would you go about training such a model? How would you judge the model's accuracy in digit classification with real data examples, such as customer or client handwritten digits on paper?

You will do this exclusively using the backpropagation learning method. You will have gathered and preprocessed your data, designed and refined your network structure, trained and tested the network, varied the hyperparameters to improve performance and analyzed/assessed the results.

The most important thing is not just to give a summary of classification rates/errors. I trust that you can get a working classifier, or can train a network to do any useful task.

The important things are to identify - for each different class of input data - what it is that the hidden nodes are responding to.

You may use MNIST data for the first assignment. You can train and test a classifier on this data. But the core challenge is still to figure out what it is that the hidden nodes are responding to, and making the task more complex will not change this as the core focus. You need to conduct a minimum of the following 5 experiments for this data, in order to get some useful insights. You are welcome to conduct more experiments.

EXPERIMENT 1: Our dense neural network will consist of 784 input nodes, a hidden layer with 1 node and 10 output nodes (corresponding to the 10 digits). We use mnist.load_data() to get the 70,000 images divided into a set of 60,000 training images and 10,000 test images. We hold back 5,000 of the 60,000 training images for validation. After training the model, we group the 60,000 activation values of the hidden node for the (original) set of training images by the 10 predicted classes and visualize these sets of values using a boxplot. We expect the overlap between the range of values in the "boxes" to be minimal. In addition, we find the pattern that maximally activates the hidden node as a "warm up" exercise for similar analysis we will perform on CNN models in Assignment 2.

EXPERIMENT 2: This time our dense neural network will have 784 input nodes, a hidden layer with 2 nodes and 10 output nodes (corresponding to the 10 digits). For each of the 60,000 images, the output of the two hidden nodes are plotted using a scatterplot. We color code the points according to which of the 10 classes the the output of the two nodes predicts. Ideally, just like in EXPERIMENT 1, the color clusters should have very little overlap. Also compare the accuracy % & confusion matrix of Experiments 1 & 2. Again, the goal is to get more insights.

EXPERIMENT 3: You can explore with more hidden nodes. Then end up with 1 ‘final’ model. Say the ‘best’ model.

EXPERIMENT 4: Use PCA decomposition to reduce the number of dimensions of our training set of 28x28 dimensional MNIST images from 784 to 154 (with 95% of training images variance lying along these components). We also reduce the number of dimensions of 'best' model from Experiment 3 to 154 inputs nodes and train it on the new lower dimensional data. We then compare the performance of Experiments 3 and 4.

EXPERIMENT 5: We use a Random Forest classifier to get the relative importance of the 784 features (pixels) of the 28x28 dimensional images in training set of MNIST images and select the top 70 features (pixels). We train our 'best' dense neural network using these 70 features and compare its performance to the the dense neural network models from EXPERIMENTS 3 and 4.

In [1]:
from IPython.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

Importing Packages¶

  • First we import all the packages that will be used in the assignment.

  • Since Keras is integrated in TensorFlow 2.x, we import keras from tensorflow and use tenserflow.keras.xxx to import all other Keras packages. The seed argument produces a deterministic sequence of tensors across multiple calls.

In [2]:
import datetime
from packaging import version
from collections import Counter
import numpy as np
import pandas as pd
import random

import matplotlib as mpl  # EA
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import confusion_matrix, classification_report
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import accuracy_score

import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow import keras
from tensorflow.keras import models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
import tensorflow.keras.backend as k
from tensorflow.python.client import device_lib
import warnings
warnings.filterwarnings('ignore')
In [3]:
%matplotlib inline
np.set_printoptions(precision=3, suppress=True) 

Verify TensorFlow version¶

In [4]:
print("This notebook requires TensorFlow 2.0 or above")
print("TensorFlow version: ", tf.__version__)
assert version.parse(tf.__version__).release[0] >=2
This notebook requires TensorFlow 2.0 or above
TensorFlow version:  2.12.0

Mount Google Drive to Colab environment¶

In [5]:
#from google.colab import drive
#drive.mount('/content/gdrive')

Research Assignment Reporting Functions¶

In [6]:
def print_validation_report(test_labels, predictions):
    print("Classification Report")
    print(classification_report(test_labels, predictions))
    print('Accuracy Score: {}'.format(accuracy_score(test_labels, predictions)))
    print('Root Mean Square Error: {}'.format(np.sqrt(MSE(test_labels, predictions))))

def plot_confusion_matrix(y_true, y_pred):
    mtx = confusion_matrix(y_true, y_pred)
    fig, ax = plt.subplots(figsize=(16,12))
    sns.heatmap(mtx, annot=True, fmt='d', linewidths=.75,  cbar=False, ax=ax,cmap='Blues',linecolor='white')
    #  square=True,
    plt.ylabel('true label')
    plt.xlabel('predicted label')

    return mtx

def plot_history(history):
  losses = history.history['loss']
  accs = history.history['accuracy']
  val_losses = history.history['val_loss']
  val_accs = history.history['val_accuracy']
  epochs = len(losses)

  plt.figure(figsize=(16, 4))
  for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
  plt.show()

def plot_digits(instances, pos, images_per_row=5, **options):
    size = 28
    images_per_row = min(len(instances), images_per_row)
    images = [instance.reshape(size,size) for instance in instances]
    n_rows = (len(instances) - 1) // images_per_row + 1
    row_images = []
    n_empty = n_rows * images_per_row - len(instances)
    images.append(np.zeros((size, size * n_empty)))
    for row in range(n_rows):
        rimages = images[row * images_per_row : (row + 1) * images_per_row]
        row_images.append(np.concatenate(rimages, axis=1))
    image = np.concatenate(row_images, axis=0)
    pos.imshow(image, cmap = 'binary', **options)
    pos.axis("off")

def plot_digit(data):
    image = data.reshape(28, 28)
    plt.imshow(image, cmap = 'hot',
               interpolation="nearest")
    plt.axis("off")

def display_training_curves(training, validation, title, subplot):
    ax = plt.subplot(subplot)
    ax.plot(training)
    ax.plot(validation)
    ax.set_title('model '+ title)
    ax.set_ylabel(title)
    ax.set_xlabel('epoch')
    ax.legend(['training', 'validation'])
In [7]:
seed_val = 43
np.random.seed(seed_val)
random.seed(seed_val)
tf.random.set_seed(seed_val)

Loading MNIST Dataset¶

  • The MNIST dataset of handwritten digits has a training set of 60,000 images, and a test set of 10,000 images. It comes prepackaged as part of tf.Keras. Use the tf.keras.datasets.mnist.load_data to the get these datasets (and the corresponding labels) as Numpy arrays.
In [8]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
  • Tuples of Numpy arrays: (x_train, y_train), (x_test, y_test)
  • x_train, x_test: uint8 arrays of grayscale image data with shapes (num_samples, 28, 28).
  • y_train, y_test: uint8 arrays of digit labels (integers in range 0-9)

EDA Training and Test Sets¶

  • Inspect the training and test sets as well as their labels as follows.
In [9]:
print('x_train:\t{}'.format(train_images.shape))
print('y_train:\t{}'.format(train_labels.shape))
print('x_test:\t\t{}'.format(test_images.shape))
print('y_test:\t\t{}'.format(test_labels.shape))
x_train:	(60000, 28, 28)
y_train:	(60000,)
x_test:		(10000, 28, 28)
y_test:		(10000,)
In [10]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

Validation Dataset¶

  • Create validation set from training set: 5000 images
In [11]:
val_images, train_images = train_images[:5000], train_images[5000:] 
val_labels, train_labels = train_labels[:5000], train_labels[5000:]

Review labels for training set¶

In [12]:
print("First ten labels training dataset:\n {}\n".format(train_labels[0:10]))
First ten labels training dataset:
 [7 3 4 6 1 8 1 0 9 8]

In [ ]:
 

Find frequency of each label in training and test sets¶

In [13]:
# reload as we have removed 5000 for validation
(train_images_dist, train_labels_dist), (test_images_dist, test_labels_dist) = mnist.load_data()
In [14]:
plt.figure(figsize = (12 ,8))
items = [{'Class': x, 'Count': y} for x, y in Counter(train_labels_dist).items()]
distribution = pd.DataFrame(items).sort_values(['Class'])
sns.barplot(x=distribution.Class, y=distribution.Count);
No description has been provided for this image
In [15]:
Counter(train_labels_dist).most_common()
Out[15]:
[(1, 6742),
 (7, 6265),
 (3, 6131),
 (2, 5958),
 (9, 5949),
 (0, 5923),
 (6, 5918),
 (8, 5851),
 (4, 5842),
 (5, 5421)]
In [16]:
Counter(test_labels_dist).most_common()
Out[16]:
[(1, 1135),
 (2, 1032),
 (7, 1028),
 (3, 1010),
 (9, 1009),
 (4, 982),
 (0, 980),
 (8, 974),
 (6, 958),
 (5, 892)]

Plot sample images with their labels¶

In [17]:
fig = plt.figure(figsize = (15, 9))

for i in range(50):
    plt.subplot(5, 10, 1+i)
    plt.title(train_labels_dist[i])
    plt.xticks([])
    plt.yticks([])
    plt.imshow(train_images_dist[i].reshape(28,28), cmap='binary')
No description has been provided for this image
In [18]:
np.set_printoptions(linewidth=np.inf)
print("{}".format(train_images_dist[2020]))
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0 167 208  19   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0  13 235 254  99   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0  74 254 234   4   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0 154 254 145   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0 224 254  92   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0  51 245 211  13   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   2 169 254 101   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  27 254 254  88   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  72 255 241  15   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  88 254 153   0   0  33  53 155 156 102  15   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 130 254  31   0 128 235 254 254 254 254 186  10   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 190 254  51 178 254 246 213 111 109 186 254 145   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 192 254 229 254 216  90   0   0   0  57 254 234   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 235 254 254 247  85   0   0   0   0  32 254 234   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 235 254 254 118   0   0   0   0   0 107 254 201   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 235 255 254 102  12   0   0   0   8 188 248 119   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 207 254 254 238 107   0   0  39 175 254 148   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  84 254 248  74  11  32 115 238 254 176  11   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0  21 214 254 254 254 254 254 254 132   6   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0  14  96 176 254 254 214  48  12   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]]

Creating the DNN Model¶

  • In this step, we first choose the network architecture for the model. Then we build.compile, train and evaulate the model.

Build the DNN model¶

We use a Sequential class defined in Keras to create our model. All the layers are going to be Dense layers. This means, like the figure shown above, all the nodes of a layer would be connected to all the nodes of the preceding layer i.e. densely connected.

After the model is built, we view ....

Experiment 1¶

  • 784 Input Nodes
  • hidden layer: 1 node
  • output layer: 10 nodes
In [19]:
# k.clear_session()
model = Sequential([
    Dense(name = 'hidden_layer_1', units=1, activation='relu', input_shape=[784]),
    Dense(name = 'output_layer', units = 10, activation ='softmax')
])
In [20]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 hidden_layer_1 (Dense)      (None, 1)                 785       
                                                                 
 output_layer (Dense)        (None, 10)                20        
                                                                 
=================================================================
Total params: 805
Trainable params: 805
Non-trainable params: 0
_________________________________________________________________
In [21]:
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Out[21]:
No description has been provided for this image

Compile the DNN model¶

In addition to setting up our model architecture, we also need to define which algorithm should the model use in order to optimize the weights and biases as per the given data. We will use stochastic gradient descent.

We also need to define a loss function. Think of this function as the difference between the predicted outputs and the actual outputs given in the dataset. This loss needs to be minimized in order to have a higher model accuracy. That's what the optimization algorithm essentially does - it minimizes the loss during model training. For our multi-class classification problem, categorical cross entropy is commonly used.

Finally, we will use the accuracy during training as a metric to keep track of as the model trains.

In [22]:
model.compile(optimizer='rmsprop',
               loss = 'sparse_categorical_crossentropy',
               metrics=['accuracy'])

Train the DNN model¶

tf.keras.model.fit
https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

In [23]:
history = model.fit( train_images
    , train_labels
    , epochs=30
    , validation_data=(val_images, val_labels)
    , callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_1_optimized.h5",save_best_only=True,save_weights_only=False)] 
    )
Epoch 1/30
 154/1719 [=>............................] - ETA: 0s - loss: 2.1843 - accuracy: 0.1613      
2024-10-06 23:58:08.350857: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
1719/1719 [==============================] - 1s 352us/step - loss: 1.9725 - accuracy: 0.2158 - val_loss: 1.8652 - val_accuracy: 0.2596
Epoch 2/30
1719/1719 [==============================] - 1s 294us/step - loss: 1.7900 - accuracy: 0.2817 - val_loss: 1.7219 - val_accuracy: 0.2974
Epoch 3/30
1719/1719 [==============================] - 1s 293us/step - loss: 1.6930 - accuracy: 0.2992 - val_loss: 1.6711 - val_accuracy: 0.2974
Epoch 4/30
1719/1719 [==============================] - 1s 296us/step - loss: 1.6588 - accuracy: 0.3037 - val_loss: 1.6456 - val_accuracy: 0.3006
Epoch 5/30
1719/1719 [==============================] - 0s 289us/step - loss: 1.6414 - accuracy: 0.3058 - val_loss: 1.6329 - val_accuracy: 0.3008
Epoch 6/30
1719/1719 [==============================] - 0s 290us/step - loss: 1.6302 - accuracy: 0.3091 - val_loss: 1.6230 - val_accuracy: 0.3024
Epoch 7/30
1719/1719 [==============================] - 0s 289us/step - loss: 1.6219 - accuracy: 0.3111 - val_loss: 1.6216 - val_accuracy: 0.3054
Epoch 8/30
1719/1719 [==============================] - 1s 295us/step - loss: 1.6156 - accuracy: 0.3131 - val_loss: 1.6163 - val_accuracy: 0.2998
Epoch 9/30
1719/1719 [==============================] - 1s 294us/step - loss: 1.6107 - accuracy: 0.3172 - val_loss: 1.6093 - val_accuracy: 0.3336
Epoch 10/30
1719/1719 [==============================] - 1s 298us/step - loss: 1.6055 - accuracy: 0.3262 - val_loss: 1.6089 - val_accuracy: 0.3396
Epoch 11/30
1719/1719 [==============================] - 1s 294us/step - loss: 1.6012 - accuracy: 0.3371 - val_loss: 1.6012 - val_accuracy: 0.3458
Epoch 12/30
1719/1719 [==============================] - 1s 293us/step - loss: 1.5968 - accuracy: 0.3450 - val_loss: 1.5939 - val_accuracy: 0.3520
Epoch 13/30
1719/1719 [==============================] - 1s 336us/step - loss: 1.5913 - accuracy: 0.3499 - val_loss: 1.5872 - val_accuracy: 0.3648
Epoch 14/30
1719/1719 [==============================] - 1s 295us/step - loss: 1.5856 - accuracy: 0.3554 - val_loss: 1.5802 - val_accuracy: 0.3684
Epoch 15/30
1719/1719 [==============================] - 1s 292us/step - loss: 1.5814 - accuracy: 0.3578 - val_loss: 1.5831 - val_accuracy: 0.3606
Epoch 16/30
1719/1719 [==============================] - 1s 294us/step - loss: 1.5784 - accuracy: 0.3550 - val_loss: 1.5746 - val_accuracy: 0.3712
Epoch 17/30
1719/1719 [==============================] - 1s 293us/step - loss: 1.5751 - accuracy: 0.3589 - val_loss: 1.5716 - val_accuracy: 0.3774
Epoch 18/30
1719/1719 [==============================] - 1s 293us/step - loss: 1.5733 - accuracy: 0.3597 - val_loss: 1.5781 - val_accuracy: 0.3848
Epoch 19/30
1719/1719 [==============================] - 1s 292us/step - loss: 1.5713 - accuracy: 0.3608 - val_loss: 1.5694 - val_accuracy: 0.3722
Epoch 20/30
1719/1719 [==============================] - 0s 290us/step - loss: 1.5699 - accuracy: 0.3638 - val_loss: 1.5667 - val_accuracy: 0.3758
Epoch 21/30
1719/1719 [==============================] - 1s 294us/step - loss: 1.5690 - accuracy: 0.3641 - val_loss: 1.5665 - val_accuracy: 0.3886
Epoch 22/30
1719/1719 [==============================] - 1s 291us/step - loss: 1.5675 - accuracy: 0.3681 - val_loss: 1.5669 - val_accuracy: 0.3852
Epoch 23/30
1719/1719 [==============================] - 1s 294us/step - loss: 1.5659 - accuracy: 0.3733 - val_loss: 1.5640 - val_accuracy: 0.3926
Epoch 24/30
1719/1719 [==============================] - 1s 292us/step - loss: 1.5630 - accuracy: 0.3817 - val_loss: 1.5590 - val_accuracy: 0.3988
Epoch 25/30
1719/1719 [==============================] - 1s 294us/step - loss: 1.5572 - accuracy: 0.3922 - val_loss: 1.5570 - val_accuracy: 0.4018
Epoch 26/30
1719/1719 [==============================] - 1s 291us/step - loss: 1.5491 - accuracy: 0.3905 - val_loss: 1.5549 - val_accuracy: 0.3924
Epoch 27/30
1719/1719 [==============================] - 1s 292us/step - loss: 1.5430 - accuracy: 0.3887 - val_loss: 1.5384 - val_accuracy: 0.3910
Epoch 28/30
1719/1719 [==============================] - 1s 292us/step - loss: 1.5388 - accuracy: 0.3871 - val_loss: 1.5353 - val_accuracy: 0.3860
Epoch 29/30
1719/1719 [==============================] - 1s 292us/step - loss: 1.5348 - accuracy: 0.3846 - val_loss: 1.5270 - val_accuracy: 0.3842
Epoch 30/30
1719/1719 [==============================] - 1s 291us/step - loss: 1.5322 - accuracy: 0.3840 - val_loss: 1.5261 - val_accuracy: 0.3870
In [24]:
model = tf.keras.models.load_model("exp_1_optimized.h5")

Evaluate the DNN model¶

In order to ensure that this is not a simple "memorization" by the machine, we should evaluate the performance on the test set. This is easy to do, we simply use the evaluate method on our model.

In [25]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'test acc: {test_acc}, test loss: {test_loss}')
313/313 [==============================] - 0s 252us/step - loss: 1.5491 - accuracy: 0.3818
test acc: 0.38179999589920044, test loss: 1.5491249561309814

Plot performance metrics¶

We use Matplotlib to create 2 plots--displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [26]:
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
In [27]:
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
No description has been provided for this image

Making Predictions¶

In [28]:
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images) 
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 189us/step
In [29]:
print_validation_report(train_labels, pred_classes)
Classification Report
              precision    recall  f1-score   support

           0       0.52      0.45      0.48      5444
           1       0.43      0.88      0.58      6179
           2       0.23      0.38      0.29      5470
           3       0.27      0.53      0.36      5638
           4       0.00      0.00      0.00      5307
           5       0.00      0.00      0.00      4987
           6       0.38      0.45      0.41      5417
           7       0.55      0.86      0.67      5715
           8       0.00      0.00      0.00      5389
           9       0.35      0.16      0.22      5454

    accuracy                           0.38     55000
   macro avg       0.27      0.37      0.30     55000
weighted avg       0.28      0.38      0.31     55000

Accuracy Score: 0.3846
Root Mean Square Error: 3.233677894792975

Create the confusion matrix¶

Let us see what the confusion matrix looks like. Using both sklearn.metrics. Then we visualize the confusion matrix and see what that tells us.

In [30]:
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
In [31]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
Out[31]:
  0 1 2 3 4 5 6 7 8 9
0 0.00% 8.83% 0.05% 0.39% 4.47% 0.06% 0.00% 55.68% 0.28% 30.24%
1 0.64% 14.44% 12.32% 22.97% 14.98% 11.79% 1.42% 0.04% 20.24% 1.16%
2 0.00% 16.55% 0.19% 1.17% 9.21% 0.20% 0.00% 37.29% 0.86% 34.52%
3 22.75% 0.04% 19.22% 6.16% 0.11% 15.21% 29.37% 0.00% 7.14% 0.00%
4 0.09% 30.87% 4.34% 13.44% 24.71% 4.38% 0.24% 1.35% 10.94% 9.64%
5 5.12% 1.52% 23.26% 19.24% 2.39% 20.39% 8.83% 0.00% 19.24% 0.01%
6 0.12% 29.47% 5.03% 14.76% 24.26% 5.06% 0.30% 0.95% 12.12% 7.94%
7 44.25% 0.00% 6.66% 0.81% 0.00% 4.75% 42.44% 0.00% 1.09% 0.00%
8 0.00% 18.14% 0.23% 1.40% 10.27% 0.25% 0.00% 34.04% 1.02% 34.65%
9 6.96% 0.90% 24.01% 17.00% 1.54% 20.70% 11.45% 0.00% 17.42% 0.01%
10 37.92% 0.00% 10.28% 1.74% 0.01% 7.59% 40.24% 0.00% 2.22% 0.00%
11 7.25% 0.84% 24.07% 16.69% 1.45% 20.70% 11.85% 0.00% 17.15% 0.00%
12 0.03% 33.30% 2.14% 8.28% 23.83% 2.22% 0.08% 4.91% 6.51% 18.69%
13 16.04% 0.14% 22.44% 9.56% 0.31% 18.32% 22.59% 0.00% 10.60% 0.00%
14 0.00% 8.83% 0.05% 0.39% 4.47% 0.06% 0.00% 55.68% 0.28% 30.24%
15 35.47% 0.00% 11.75% 2.22% 0.01% 8.79% 38.96% 0.00% 2.80% 0.00%
16 3.86% 2.33% 22.19% 20.97% 3.42% 19.74% 6.93% 0.00% 20.54% 0.03%
17 0.00% 8.83% 0.05% 0.39% 4.47% 0.06% 0.00% 55.68% 0.28% 30.24%
18 44.95% 0.00% 6.29% 0.73% 0.00% 4.46% 42.57% 0.00% 0.99% 0.00%
19 59.92% 0.00% 0.89% 0.03% 0.00% 0.55% 38.56% 0.00% 0.05% 0.00%

Visualize the confusion matrix¶

We use code from chapter 3 of Hands on Machine Learning (A. Geron) (cf. https://github.com/ageron/handson-ml2/blob/master/03_classification.ipynb) to display a "heat map" of the confusion matrix. Then we normalize the confusion matrix so we can compare error rates.

See https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ch03.html#classification_chapter

In [32]:
mtx = plot_confusion_matrix(train_labels,pred_classes)
No description has been provided for this image

Get Activation Values of the Hidden Nodes (128)¶

To get the activation values of the hidden nodes, we need to create a new model, activation_model, that takes the same input as our current model but outputs the activation value of the hidden layer, i.e. of the hidden node. Then use the predict function to get the activation values.

In [33]:
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

print(f"There are {len(layer_outputs)} layers")
layer_outputs # description of the layers
There are 2 layers
Out[33]:
[<KerasTensor: shape=(None, 1) dtype=float32 (created by layer 'hidden_layer_1')>,
 <KerasTensor: shape=(None, 10) dtype=float32 (created by layer 'output_layer')>]
In [34]:
# Get the output of the hidden node for each of the 55000 training images
activations = activation_model.predict(train_images)
hidden_layer_activation = activations[0]
hidden_layer_activation.shape   #  hidden node has one activation value per training image
1719/1719 [==============================] - 0s 217us/step
Out[34]:
(55000, 1)
In [35]:
print(f"The maximum activation value of the hidden node is {hidden_layer_activation.max()}")
The maximum activation value of the hidden node is 14.317005157470703
In [36]:
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True)  # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10)
The output for the first image are [0.    0.088 0.001 0.004 0.045 0.001 0.    0.557 0.003 0.302]
The sum of the probabilities is (approximately) 1.0
In [37]:
boxplot_df = pd.DataFrame({'act_value':hidden_layer_activation.reshape(55000),
                           'pred_class':pred_classes})
boxplot_df.head()
Out[37]:
act_value pred_class
0 0.000000 7
1 2.302789 3
2 0.304156 7
3 5.188391 6
4 1.470149 1

Visualize the activation values with boxplots¶

We get the activation values of the first hidden node and combine them with the corresponding class labels into a DataFrame. We use both matplotlib and seaborn to create boxplots from the dataframe.

In [38]:
# To see how closely the hidden nodes activation values correlate with the class predictions
# Note that there were no 5s detected and that there were outliers for the activation values for the 6s
boxplot_df[['act_value','pred_class']].boxplot(by ='pred_class', column =['act_value'], grid = True) 
Out[38]:
<Axes: title={'center': 'act_value'}, xlabel='pred_class'>
No description has been provided for this image
In [39]:
boxplot_df['pred_class'].value_counts() # Another way to verify what the boxplot is telling us
Out[39]:
pred_class
1    12614
3    10871
7     8982
2     8968
6     6436
0     4676
9     2453
Name: count, dtype: int64
In [40]:
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
bplot = sns.boxplot(y='act_value', x='pred_class', 
                 data=boxplot_df, 
                 width=0.5,
                 palette="colorblind")
No description has been provided for this image
In [41]:
cl_a, cl_b = 1, 9
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))

p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)

plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);


p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")

# plt.savefig("error_analysis_digits_plot_EXP1_valid")

plt.show()
No description has been provided for this image

Experiment 2¶

  • 784 Input Nodes
  • hidden layer: 2 nodes
  • output layer: 10 nodes
In [42]:
# k.clear_session()
model = Sequential([
    Dense(name = 'hidden_layer_1', units=2, activation='relu', input_shape=[784]),
    Dense(name = 'output_layer', units = 10, activation ='softmax')
])

Build the DNN model¶

In [43]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 hidden_layer_1 (Dense)      (None, 2)                 1570      
                                                                 
 output_layer (Dense)        (None, 10)                30        
                                                                 
=================================================================
Total params: 1,600
Trainable params: 1,600
Non-trainable params: 0
_________________________________________________________________
In [44]:
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Out[44]:
No description has been provided for this image

Compile the DNN model¶

In [45]:
model.compile(optimizer='rmsprop',
               loss = 'sparse_categorical_crossentropy',
               metrics=['accuracy'])

Train the DNN model¶

In [46]:
history = model.fit( train_images
    , train_labels
    , epochs=30
    , validation_data=(val_images, val_labels)
    , callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_2_optimized.h5",save_best_only=True,save_weights_only=False)] 
    )
Epoch 1/30
1719/1719 [==============================] - 1s 333us/step - loss: 1.6624 - accuracy: 0.4288 - val_loss: 1.4240 - val_accuracy: 0.5120
Epoch 2/30
1719/1719 [==============================] - 1s 303us/step - loss: 1.3540 - accuracy: 0.5343 - val_loss: 1.2949 - val_accuracy: 0.5608
Epoch 3/30
1719/1719 [==============================] - 1s 304us/step - loss: 1.2780 - accuracy: 0.5690 - val_loss: 1.2404 - val_accuracy: 0.5916
Epoch 4/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.2256 - accuracy: 0.6023 - val_loss: 1.1941 - val_accuracy: 0.6274
Epoch 5/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.1894 - accuracy: 0.6173 - val_loss: 1.1670 - val_accuracy: 0.6338
Epoch 6/30
1719/1719 [==============================] - 1s 310us/step - loss: 1.1712 - accuracy: 0.6231 - val_loss: 1.1501 - val_accuracy: 0.6368
Epoch 7/30
1719/1719 [==============================] - 1s 304us/step - loss: 1.1608 - accuracy: 0.6248 - val_loss: 1.1494 - val_accuracy: 0.6334
Epoch 8/30
1719/1719 [==============================] - 1s 303us/step - loss: 1.1527 - accuracy: 0.6286 - val_loss: 1.1364 - val_accuracy: 0.6454
Epoch 9/30
1719/1719 [==============================] - 1s 305us/step - loss: 1.1463 - accuracy: 0.6318 - val_loss: 1.1326 - val_accuracy: 0.6502
Epoch 10/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.1411 - accuracy: 0.6349 - val_loss: 1.1307 - val_accuracy: 0.6372
Epoch 11/30
1719/1719 [==============================] - 1s 300us/step - loss: 1.1361 - accuracy: 0.6394 - val_loss: 1.1234 - val_accuracy: 0.6432
Epoch 12/30
1719/1719 [==============================] - 1s 301us/step - loss: 1.1308 - accuracy: 0.6419 - val_loss: 1.1199 - val_accuracy: 0.6514
Epoch 13/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.1245 - accuracy: 0.6431 - val_loss: 1.1118 - val_accuracy: 0.6590
Epoch 14/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.1191 - accuracy: 0.6453 - val_loss: 1.1059 - val_accuracy: 0.6592
Epoch 15/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.1145 - accuracy: 0.6466 - val_loss: 1.1011 - val_accuracy: 0.6664
Epoch 16/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.1103 - accuracy: 0.6503 - val_loss: 1.0981 - val_accuracy: 0.6658
Epoch 17/30
1719/1719 [==============================] - 1s 301us/step - loss: 1.1060 - accuracy: 0.6510 - val_loss: 1.0925 - val_accuracy: 0.6716
Epoch 18/30
1719/1719 [==============================] - 1s 299us/step - loss: 1.1023 - accuracy: 0.6514 - val_loss: 1.0949 - val_accuracy: 0.6718
Epoch 19/30
1719/1719 [==============================] - 1s 301us/step - loss: 1.0991 - accuracy: 0.6541 - val_loss: 1.0878 - val_accuracy: 0.6758
Epoch 20/30
1719/1719 [==============================] - 1s 301us/step - loss: 1.0965 - accuracy: 0.6548 - val_loss: 1.0837 - val_accuracy: 0.6768
Epoch 21/30
1719/1719 [==============================] - 1s 301us/step - loss: 1.0939 - accuracy: 0.6565 - val_loss: 1.0833 - val_accuracy: 0.6738
Epoch 22/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.0910 - accuracy: 0.6576 - val_loss: 1.0902 - val_accuracy: 0.6704
Epoch 23/30
1719/1719 [==============================] - 1s 303us/step - loss: 1.0897 - accuracy: 0.6583 - val_loss: 1.0780 - val_accuracy: 0.6806
Epoch 24/30
1719/1719 [==============================] - 1s 299us/step - loss: 1.0870 - accuracy: 0.6583 - val_loss: 1.0789 - val_accuracy: 0.6710
Epoch 25/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.0848 - accuracy: 0.6573 - val_loss: 1.0755 - val_accuracy: 0.6714
Epoch 26/30
1719/1719 [==============================] - 1s 300us/step - loss: 1.0827 - accuracy: 0.6572 - val_loss: 1.0761 - val_accuracy: 0.6684
Epoch 27/30
1719/1719 [==============================] - 1s 304us/step - loss: 1.0806 - accuracy: 0.6588 - val_loss: 1.0692 - val_accuracy: 0.6678
Epoch 28/30
1719/1719 [==============================] - 1s 299us/step - loss: 1.0793 - accuracy: 0.6573 - val_loss: 1.0870 - val_accuracy: 0.6570
Epoch 29/30
1719/1719 [==============================] - 1s 302us/step - loss: 1.0762 - accuracy: 0.6571 - val_loss: 1.0618 - val_accuracy: 0.6736
Epoch 30/30
1719/1719 [==============================] - 1s 301us/step - loss: 1.0743 - accuracy: 0.6571 - val_loss: 1.0690 - val_accuracy: 0.6584

Evaluate the DNN model¶

In [47]:
model = tf.keras.models.load_model("exp_2_optimized.h5")
test_loss, test_acc = model.evaluate(test_images, test_labels)
313/313 [==============================] - 0s 262us/step - loss: 1.0713 - accuracy: 0.6571
In [48]:
print(f'test acc: {test_acc}, test loss: {test_loss}')
test acc: 0.6571000218391418, test loss: 1.0712623596191406

Plot performance metrics¶

In [49]:
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
In [50]:
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
No description has been provided for this image

Making Predictions¶

In [51]:
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images) 
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 192us/step
In [52]:
print_validation_report(train_labels, pred_classes)
Classification Report
              precision    recall  f1-score   support

           0       0.79      0.80      0.79      5444
           1       0.85      0.92      0.88      6179
           2       0.60      0.45      0.52      5470
           3       0.68      0.70      0.69      5638
           4       0.51      0.47      0.49      5307
           5       0.70      0.64      0.67      4987
           6       0.80      0.75      0.77      5417
           7       0.69      0.82      0.75      5715
           8       0.46      0.55      0.50      5389
           9       0.47      0.44      0.46      5454

    accuracy                           0.66     55000
   macro avg       0.66      0.65      0.65     55000
weighted avg       0.66      0.66      0.66     55000

Accuracy Score: 0.6596545454545455
Root Mean Square Error: 2.541706656416654

Create the confusion matrix¶

In [53]:
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
In [54]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
Out[54]:
  0 1 2 3 4 5 6 7 8 9
0 0.00% 0.01% 9.70% 0.00% 3.15% 0.00% 0.06% 59.22% 0.10% 27.75%
1 0.08% 3.80% 0.04% 76.99% 0.26% 8.88% 0.02% 0.02% 9.76% 0.15%
2 0.28% 0.37% 15.27% 2.55% 22.50% 1.09% 1.23% 7.45% 16.41% 32.84%
3 2.98% 0.00% 20.11% 0.00% 24.97% 0.00% 50.53% 0.00% 0.28% 1.12%
4 0.00% 91.46% 0.04% 5.28% 0.07% 0.05% 0.00% 2.20% 0.35% 0.54%
5 20.26% 0.00% 1.18% 1.21% 8.80% 18.63% 6.24% 0.00% 43.24% 0.45%
6 0.00% 89.19% 0.04% 7.97% 0.08% 0.09% 0.00% 1.60% 0.51% 0.52%
7 90.02% 0.00% 0.01% 0.00% 0.25% 3.44% 3.83% 0.00% 2.45% 0.00%
8 0.18% 0.08% 20.17% 0.57% 23.18% 0.34% 1.31% 8.74% 8.20% 37.23%
9 32.44% 0.00% 1.07% 0.39% 8.64% 13.92% 9.56% 0.00% 33.71% 0.27%
10 82.97% 0.00% 0.02% 0.01% 0.47% 7.11% 4.15% 0.00% 5.27% 0.00%
11 1.20% 0.08% 0.10% 38.21% 0.88% 30.74% 0.17% 0.00% 28.48% 0.15%
12 0.00% 92.45% 0.04% 1.53% 0.04% 0.01% 0.00% 5.23% 0.11% 0.59%
13 1.27% 0.00% 0.00% 31.67% 0.08% 53.39% 0.03% 0.00% 13.55% 0.00%
14 0.00% 0.02% 8.92% 0.00% 3.02% 0.00% 0.06% 60.79% 0.11% 27.08%
15 84.10% 0.00% 0.04% 0.00% 0.79% 3.68% 7.06% 0.00% 4.32% 0.00%
16 0.00% 0.00% 62.59% 0.00% 10.19% 0.00% 1.14% 3.37% 0.01% 22.70%
17 0.00% 0.00% 36.39% 0.00% 7.54% 0.00% 0.35% 20.50% 0.03% 35.19%
18 33.77% 0.00% 0.48% 0.00% 2.43% 0.00% 63.20% 0.00% 0.10% 0.00%
19 93.83% 0.00% 0.00% 0.00% 0.14% 0.97% 4.24% 0.00% 0.82% 0.00%

Visualize the confusion matrix¶

In [55]:
mtx = plot_confusion_matrix(train_labels,pred_classes)
No description has been provided for this image
In [56]:
cl_a, cl_b = 2, 3
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]
plt.figure(figsize=(8,8))

p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)

plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);


p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")

# plt.savefig("error_analysis_digits_plot_EXP1_valid")

plt.show()
No description has been provided for this image

Get Activation Values of the Hidden Nodes (128)¶

In [57]:
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

print(f"There are {len(layer_outputs)} layers")
layer_outputs # description of the layers
There are 2 layers
Out[57]:
[<KerasTensor: shape=(None, 2) dtype=float32 (created by layer 'hidden_layer_1')>,
 <KerasTensor: shape=(None, 10) dtype=float32 (created by layer 'output_layer')>]
In [58]:
# Get the output of the hidden node for each of the 55000 training images
activations = activation_model.predict(train_images)
hidden_layer_activation = activations[0]
hidden_layer_activation.shape   #  hidden node has one activation value per training image
1719/1719 [==============================] - 0s 219us/step
Out[58]:
(55000, 2)
In [59]:
hidden_node1_activation = hidden_layer_activation[:,0] # get activation values of the first hidden node
hidden_node2_activation = hidden_layer_activation[:,1] # get activation values of the second hidden node

print(f"The maximum activation value of the first hidden node is {hidden_node1_activation.max()}")
print(f"The maximum activation value of the second hidden node is {hidden_node2_activation.max()}")
The maximum activation value of the first hidden node is 22.321428298950195
The maximum activation value of the second hidden node is 75.70663452148438
In [60]:
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True)  # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10)
The output for the first image are [0.    0.    0.097 0.    0.032 0.    0.001 0.592 0.001 0.278]
The sum of the probabilities is (approximately) 1.0
In [61]:
scatterPlot_df =  pd.DataFrame({'act_value_h1':hidden_node1_activation,
                                'act_value_h2':hidden_node2_activation,
                                'pred_class':pred_classes})
scatterPlot_df.head()
Out[61]:
act_value_h1 act_value_h2 pred_class
0 0.000000 5.461303 7
1 4.111321 0.957018 3
2 2.540297 4.186171 9
3 6.147233 12.957750 6
4 0.769478 0.000000 1
In [62]:
# To see how closely the hidden nodes activation values correlate with the class predictions
# Note that there were no 5s detected and that there were outliers for the activation values for the 6s
boxplot_df[['act_value','pred_class']].boxplot(by ='pred_class', column =['act_value'], grid = True) 
Out[62]:
<Axes: title={'center': 'act_value'}, xlabel='pred_class'>
No description has been provided for this image
In [63]:
#plt.legend(loc='upper left', prop={'size':6}, bbox_to_anchor=(1,1),ncol=1)
plt.scatter(scatterPlot_df.act_value_h1, 
            scatterPlot_df.act_value_h2, 
            c=scatterPlot_df.pred_class,
            label=scatterPlot_df.pred_class)
plt.show()
No description has been provided for this image
In [64]:
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
# Let us use seaborn for the boxplots this time.
bplot = sns.boxplot(y='act_value', x='pred_class', 
                 data=boxplot_df, 
                 width=0.5,
                 palette="colorblind")
No description has been provided for this image
In [65]:
groups = scatterPlot_df.groupby('pred_class')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
    ax.plot(group.act_value_h1, group.act_value_h2, marker='o', linestyle='', ms=12, label=name)
ax.legend()

plt.show()
No description has been provided for this image

Experiment 3¶

  • 784 Input Nodes
  • hidden layer: 128 nodes
  • output layer: 10 nodes
In [66]:
# k.clear_session()
model = Sequential([
    Dense(name = 'hidden_layer_1', units=128, activation='relu', input_shape=[784]),
    Dense(name = 'output_layer', units = 10, activation ='softmax')
])

Build the DNN model¶

In [67]:
model.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 hidden_layer_1 (Dense)      (None, 128)               100480    
                                                                 
 output_layer (Dense)        (None, 10)                1290      
                                                                 
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
In [68]:
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Out[68]:
No description has been provided for this image

Compile the DNN model¶

In [69]:
model.compile(optimizer='rmsprop',
               loss = 'sparse_categorical_crossentropy',
               metrics=['accuracy'])

Train the DNN model¶

In [70]:
history = model.fit(train_images
    , train_labels
    , epochs=20
    , validation_data=(val_images, val_labels)
    , callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_3_optimized.h5",save_best_only=True,save_weights_only=False)] 
    )
Epoch 1/20
1719/1719 [==============================] - 1s 605us/step - loss: 0.2735 - accuracy: 0.9216 - val_loss: 0.1381 - val_accuracy: 0.9632
Epoch 2/20
1719/1719 [==============================] - 1s 584us/step - loss: 0.1292 - accuracy: 0.9617 - val_loss: 0.1016 - val_accuracy: 0.9712
Epoch 3/20
1719/1719 [==============================] - 1s 585us/step - loss: 0.0939 - accuracy: 0.9725 - val_loss: 0.1014 - val_accuracy: 0.9706
Epoch 4/20
1719/1719 [==============================] - 1s 586us/step - loss: 0.0738 - accuracy: 0.9786 - val_loss: 0.0811 - val_accuracy: 0.9776
Epoch 5/20
1719/1719 [==============================] - 1s 585us/step - loss: 0.0637 - accuracy: 0.9821 - val_loss: 0.0842 - val_accuracy: 0.9772
Epoch 6/20
1719/1719 [==============================] - 1s 584us/step - loss: 0.0537 - accuracy: 0.9850 - val_loss: 0.0870 - val_accuracy: 0.9758
Epoch 7/20
1719/1719 [==============================] - 1s 583us/step - loss: 0.0469 - accuracy: 0.9870 - val_loss: 0.0868 - val_accuracy: 0.9762
Epoch 8/20
1719/1719 [==============================] - 1s 582us/step - loss: 0.0408 - accuracy: 0.9893 - val_loss: 0.0856 - val_accuracy: 0.9774
Epoch 9/20
1719/1719 [==============================] - 1s 583us/step - loss: 0.0364 - accuracy: 0.9901 - val_loss: 0.0926 - val_accuracy: 0.9776
Epoch 10/20
1719/1719 [==============================] - 1s 591us/step - loss: 0.0326 - accuracy: 0.9915 - val_loss: 0.0902 - val_accuracy: 0.9790
Epoch 11/20
1719/1719 [==============================] - 1s 590us/step - loss: 0.0286 - accuracy: 0.9923 - val_loss: 0.1014 - val_accuracy: 0.9762
Epoch 12/20
1719/1719 [==============================] - 1s 588us/step - loss: 0.0255 - accuracy: 0.9933 - val_loss: 0.1036 - val_accuracy: 0.9786
Epoch 13/20
1719/1719 [==============================] - 1s 582us/step - loss: 0.0233 - accuracy: 0.9942 - val_loss: 0.1133 - val_accuracy: 0.9772
Epoch 14/20
1719/1719 [==============================] - 1s 588us/step - loss: 0.0204 - accuracy: 0.9947 - val_loss: 0.1043 - val_accuracy: 0.9772
Epoch 15/20
1719/1719 [==============================] - 1s 586us/step - loss: 0.0191 - accuracy: 0.9954 - val_loss: 0.1082 - val_accuracy: 0.9762
Epoch 16/20
1719/1719 [==============================] - 1s 582us/step - loss: 0.0166 - accuracy: 0.9959 - val_loss: 0.1037 - val_accuracy: 0.9796
Epoch 17/20
1719/1719 [==============================] - 1s 584us/step - loss: 0.0152 - accuracy: 0.9964 - val_loss: 0.1062 - val_accuracy: 0.9792
Epoch 18/20
1719/1719 [==============================] - 1s 597us/step - loss: 0.0139 - accuracy: 0.9966 - val_loss: 0.1247 - val_accuracy: 0.9760
Epoch 19/20
1719/1719 [==============================] - 1s 594us/step - loss: 0.0123 - accuracy: 0.9971 - val_loss: 0.1218 - val_accuracy: 0.9770
Epoch 20/20
1719/1719 [==============================] - 1s 595us/step - loss: 0.0109 - accuracy: 0.9973 - val_loss: 0.1282 - val_accuracy: 0.9770
In [71]:
model = tf.keras.models.load_model("exp_3_optimized.h5")

Evaluate the DNN model¶

In [72]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
313/313 [==============================] - 0s 332us/step - loss: 0.0873 - accuracy: 0.9753
In [73]:
print(f'test acc: {test_acc}, test loss: {test_loss}')
test acc: 0.9753000140190125, test loss: 0.08734780550003052

Plot performance metrics¶

In [74]:
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
In [75]:
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
No description has been provided for this image

Making Predictions¶

In [76]:
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images) 
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 270us/step
In [77]:
print_validation_report(train_labels, pred_classes)
Classification Report
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      5444
           1       0.99      0.99      0.99      6179
           2       0.98      0.99      0.98      5470
           3       0.98      0.98      0.98      5638
           4       0.98      0.99      0.99      5307
           5       0.99      0.98      0.98      4987
           6       0.99      0.99      0.99      5417
           7       0.99      0.99      0.99      5715
           8       0.98      0.98      0.98      5389
           9       0.98      0.98      0.98      5454

    accuracy                           0.99     55000
   macro avg       0.99      0.99      0.99     55000
weighted avg       0.99      0.99      0.99     55000

Accuracy Score: 0.9862
Root Mean Square Error: 0.5036051844631684

Create the confusion matrix¶

In [78]:
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
In [79]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
Out[79]:
  0 1 2 3 4 5 6 7 8 9
0 0.00% 0.00% 0.11% 0.93% 0.00% 0.00% 0.00% 98.96% 0.00% 0.00%
1 0.00% 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
2 0.00% 0.00% 0.00% 0.01% 70.95% 0.00% 0.00% 0.01% 0.02% 29.02%
3 0.04% 0.00% 0.26% 0.00% 0.04% 0.06% 99.62% 0.00% 0.00% 0.00%
4 0.00% 99.93% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.05% 0.00%
5 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%
6 0.00% 99.87% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.12% 0.00%
7 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
8 0.00% 0.00% 0.00% 0.03% 0.08% 0.00% 0.00% 0.00% 0.00% 99.88%
9 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.99% 0.00%
10 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
11 0.00% 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
12 0.00% 99.97% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.02% 0.00%
13 0.05% 0.00% 44.22% 39.71% 0.00% 15.63% 0.00% 0.35% 0.01% 0.02%
14 0.00% 0.00% 0.00% 0.03% 0.00% 0.00% 0.00% 99.97% 0.00% 0.00%
15 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
16 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
17 0.00% 0.00% 0.00% 0.08% 0.00% 0.00% 0.00% 0.00% 0.01% 99.91%
18 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% 0.00% 0.00%
19 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%

Visualize the confusion matrix¶

In [80]:
mtx = plot_confusion_matrix(train_labels,pred_classes)
No description has been provided for this image

Most problematic classifications (actual, predicted):

  • 7, 9
  • 4, 9
  • 5, 6
In [81]:
cl_a, cl_b = 7, 9
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]

plt.figure(figsize=(8,8))

p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)

plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);


p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")

# plt.savefig("error_analysis_digits_plot_EXP1_valid")

plt.show()
No description has been provided for this image
In [82]:
cl_a, cl_b = 4, 9
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]

plt.figure(figsize=(8,8))

p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)

plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);


p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")

# plt.savefig("error_analysis_digits_plot_EXP1_valid")

plt.show()
No description has been provided for this image
In [83]:
cl_a, cl_b = 5, 6
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]

plt.figure(figsize=(8,8))

p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)

plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);


p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")

# plt.savefig("error_analysis_digits_plot_EXP1_valid")

plt.show()
No description has been provided for this image

Get Activation Values of the Hidden Nodes (128)¶

In [84]:
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

print(f"There are {len(layer_outputs)} layers")
layer_outputs; # description of the layers
There are 2 layers
In [85]:
# Get the outputs of all the hidden nodes for each of the 60000 training images
activations = activation_model.predict(train_images)
hidden_layer_activation = activations[0]
output_layer_activations = activations[1]
hidden_layer_activation.shape   #  each of the 128 hidden nodes has one activation value per training image
1719/1719 [==============================] - 1s 348us/step
Out[85]:
(55000, 128)
In [86]:
output_layer_activations.shape
Out[86]:
(55000, 10)
In [87]:
print(f"The maximum activation value of the hidden nodes in the hidden layer is \
{hidden_layer_activation.max()}")
The maximum activation value of the hidden nodes in the hidden layer is 16.965179443359375
In [88]:
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True)  # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10)
The output for the first image are [0.    0.    0.001 0.009 0.    0.    0.    0.99  0.    0.   ]
The sum of the probabilities is (approximately) 1.0

Create a dataframe with the activation values and the class labels¶

In [89]:
#Get the dataframe of all the node values
activation_data = {'actual_class':train_labels}
for k in range(0,128): 
    activation_data[f"act_val_{k}"] = hidden_layer_activation[:,k]

activation_df = pd.DataFrame(activation_data)
activation_df.head(15).round(3).T
Out[89]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
actual_class 7.000 3.000 4.000 6.000 1.000 8.000 1.000 0.000 9.000 8.000 0.000 3.000 1.000 2.000 7.000
act_val_0 0.000 3.711 0.187 0.000 0.000 3.741 0.000 0.000 0.193 2.811 3.123 2.113 0.000 0.059 0.000
act_val_1 0.000 3.684 2.029 0.000 0.633 1.451 0.668 0.000 0.916 2.311 0.000 3.573 0.819 0.000 0.000
act_val_2 0.000 0.000 0.000 4.265 1.466 0.000 1.405 1.154 0.000 0.690 0.000 2.623 1.461 0.000 0.000
act_val_3 0.000 0.000 0.000 0.000 0.425 0.000 0.000 1.125 0.000 0.000 0.000 1.647 0.114 2.469 0.000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
act_val_123 1.016 1.090 0.405 0.000 2.669 0.000 3.985 0.000 1.364 0.589 0.000 2.577 3.305 0.000 4.729
act_val_124 0.000 0.000 0.000 0.000 1.227 0.000 1.308 0.000 0.000 0.000 0.000 0.000 1.167 0.000 0.007
act_val_125 0.217 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.915 0.000 0.000 0.000 0.000
act_val_126 0.536 2.282 1.825 3.017 1.237 2.934 1.666 5.814 2.024 5.022 5.104 3.167 0.970 4.330 2.374
act_val_127 0.000 1.119 0.000 0.000 0.000 1.196 0.000 0.024 0.000 2.759 1.427 3.034 0.000 0.000 0.000

129 rows × 15 columns

Visualize the activation values with boxplots¶

In [90]:
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
bplot = sns.boxplot(y='act_val_0', x='actual_class', 
                 data=activation_df[['act_val_0','actual_class']], 
                 width=0.5,
                 palette="colorblind")
No description has been provided for this image

Displaying The Range Of Activation Values For Each Class Labels¶

In [91]:
activation_df.groupby("actual_class")["act_val_0"].apply(lambda x: [round(min(x.tolist()),2),
 round(max(x.tolist()),2)]).reset_index().rename(columns={"act_val_0": "range_of_act_values"})
Out[91]:
actual_class range_of_act_values
0 0 [0.0, 4.64]
1 1 [0.0, 4.84]
2 2 [0.0, 7.74]
3 3 [0.0, 6.48]
4 4 [0.0, 3.69]
5 5 [0.0, 3.39]
6 6 [0.0, 3.48]
7 7 [0.0, 6.1]
8 8 [0.0, 5.27]
9 9 [0.0, 4.54]

Get Activation Values of the Pixel Values (784)¶

Create a dataframe with the pixel values and class labels¶

In [92]:
#Get the dataframe of all the pixel values
pixel_data = {'actual_class':train_labels}
for k in range(0,784): 
    pixel_data[f"pix_val_{k}"] = train_images[:,k]
pixel_df = pd.DataFrame(pixel_data)
pixel_df.head(15).round(3).T
Out[92]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
actual_class 7.0 3.0 4.0 6.0 1.0 8.0 1.0 0.0 9.0 8.0 0.0 3.0 1.0 2.0 7.0
pix_val_0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
pix_val_779 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_780 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_781 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_782 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
pix_val_783 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

785 rows × 15 columns

In [93]:
pixel_df.pix_val_77.value_counts()
Out[93]:
pix_val_77
0.000000    54741
1.000000       24
0.996078       10
0.992157        9
0.050980        5
            ...  
0.670588        1
0.858824        1
0.239216        1
0.839216        1
0.819608        1
Name: count, Length: 143, dtype: int64
In [94]:
pixel_df.pix_val_78.value_counts()
Out[94]:
pix_val_78
0.000000    54871
1.000000        5
0.992157        4
0.960784        4
0.098039        3
            ...  
0.047059        1
0.741176        1
0.568627        1
0.023529        1
0.501961        1
Name: count, Length: 92, dtype: int64

Use a scatter plot to visualize the predicive power of the pixel values at two fixed locations in the image, i.e. how well the pixel values at two fixed locations in the image "predict" the class labels.¶

We use a scatter plot to determine the correlation between the pix_val_77 and pix_val_78 values and the actual_class values.

In [95]:
plt.figure(figsize=(16, 10))
color = sns.color_palette("hls", 10)
sns.scatterplot(x="pix_val_77", y="pix_val_78", hue="actual_class",  palette=color, data = pixel_df, legend="full")
plt.legend(loc='upper left');
No description has been provided for this image

PCA Feature Reduction / Model Optimization¶

Use PCA decomposition to reduce the number of features from 784 features to 2 features¶

In [96]:
# Separating out the features
features = [*pixel_data][1:] # ['pix_val_0', 'pix_val_1',...]
x = pixel_df.loc[:, features].values 

pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])
In [97]:
pixel_pca_df = pd.concat([principalDf, pixel_df[['actual_class']]], axis = 1)
In [98]:
pixel_pca_df.head().round(3)
Out[98]:
principal component 1 principal component 2 actual_class
0 0.725 -2.433 7
1 0.473 1.005 3
2 -0.094 -3.010 4
3 0.221 -0.725 6
4 -3.680 2.086 1
In [99]:
pca.explained_variance_ratio_
Out[99]:
array([0.097, 0.071], dtype=float32)

Use a scatter plot to visualize the predictive power of the two principal component values.¶

In [100]:
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="principal component 1", y="principal component 2",
    hue="actual_class",
    palette=sns.color_palette("hls", 10),
    data=pixel_pca_df,
    legend="full",
    alpha=0.3
);
No description has been provided for this image

Use PCA decomposition to reduce the (activation) features from 128 (= num of hidden nodes) to 2¶

In [101]:
# Separating out the features
features = [*activation_data][1:] # ['act_val_0', 'act_val_1',...]
x = activation_df.loc[:, features].values 

pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])
principalDf.head().round(3)
Out[101]:
principal component 1 principal component 2
0 1.218 -6.660
1 -4.480 -0.455
2 -3.130 -5.158
3 2.366 4.796
4 -6.708 4.235
In [102]:
activation_pca_df = pd.concat([principalDf, activation_df[['actual_class']]], axis = 1)
activation_pca_df.head().round(3)
Out[102]:
principal component 1 principal component 2 actual_class
0 1.218 -6.660 7
1 -4.480 -0.455 3
2 -3.130 -5.158 4
3 2.366 4.796 6
4 -6.708 4.235 1
In [103]:
ev=pca.explained_variance_ratio_
ev
Out[103]:
array([0.172, 0.117], dtype=float32)
In [104]:
print(f'The {len(ev)} principal components summed together {ev[0]:.3f} + {ev[1]:.3f} = {sum(ev):.3f} explained variance')
The 2 principal components summed together 0.172 + 0.117 = 0.289 explained variance

Use a scatter plot to visualize the predictive power of two principal component values.¶

In [105]:
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="principal component 1", y="principal component 2",
    hue="actual_class",
    palette=sns.color_palette("hls", 10),
    data=activation_pca_df,
    legend="full",
    alpha=0.3
);
No description has been provided for this image

Use PCA decomposition to reduce the (activation) features from 128 (= num of hidden nodes) to 3¶

In [106]:
# Separating out the features
features = [*activation_data][1:] # ['act_val_0', 'act_val_1',...]
x = activation_df.loc[:, features].values 

pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['pca-one', 'pca-two', 'pca-three'])
principalDf.head(10).round(3).T
Out[106]:
0 1 2 3 4 5 6 7 8 9
pca-one 1.218 -4.480 -3.130 2.366 -6.708 -0.023 -7.246 13.832 -1.936 1.288
pca-two -6.660 -0.455 -5.158 4.796 4.235 2.463 4.553 -2.232 -4.552 3.532
pca-three -0.856 9.390 -1.377 -4.507 -1.830 2.874 -1.018 -0.319 -1.956 4.567
In [107]:
ev=pca.explained_variance_ratio_
ev
Out[107]:
array([0.172, 0.117, 0.099], dtype=float32)
In [108]:
print(f'The {len(ev)} principal components summed together {ev[0]:.3f} + {ev[1]:.3f} + {ev[2]:.3f} = {sum(ev):.3f} explained variance')
The 3 principal components summed together 0.172 + 0.117 + 0.099 = 0.389 explained variance
In [109]:
activation_pca_df = pd.concat([principalDf, activation_df[['actual_class']]], axis = 1)
activation_pca_df.head().round(3)
Out[109]:
pca-one pca-two pca-three actual_class
0 1.218 -6.660 -0.856 7
1 -4.480 -0.455 9.390 3
2 -3.130 -5.158 -1.377 4
3 2.366 4.796 -4.507 6
4 -6.708 4.235 -1.830 1

Use t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the (activation) features from 128 (= num of hidden nodes) to 2¶

t-Distributed Stochastic Neighbor Embedding (t-SNE) is another technique for dimensionality reduction and is particularly well suited for the visualization of high-dimensional datasets. This time we only use the first 10,000 training images (N=10000) since the technique is computationally expensive.

See http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

In [110]:
activation_df.shape
Out[110]:
(55000, 129)
In [111]:
N=55000
activation_df_subset = activation_df.iloc[:N].copy()
activation_df_subset.shape
Out[111]:
(55000, 129)
In [112]:
data_subset = activation_df_subset[features].values
data_subset.shape
Out[112]:
(55000, 128)
In [113]:
%%time
tsne = TSNE(n_components=2 # sorts nodes in ascending order,
            ,init='pca'
            ,learning_rate='auto'
            ,verbose=1
            ,perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(data_subset)
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 55000 samples in 0.012s...
[t-SNE] Computed neighbors for 55000 samples in 6.394s...
[t-SNE] Computed conditional probabilities for sample 1000 / 55000
[t-SNE] Computed conditional probabilities for sample 2000 / 55000
[t-SNE] Computed conditional probabilities for sample 3000 / 55000
[t-SNE] Computed conditional probabilities for sample 4000 / 55000
[t-SNE] Computed conditional probabilities for sample 5000 / 55000
[t-SNE] Computed conditional probabilities for sample 6000 / 55000
[t-SNE] Computed conditional probabilities for sample 7000 / 55000
[t-SNE] Computed conditional probabilities for sample 8000 / 55000
[t-SNE] Computed conditional probabilities for sample 9000 / 55000
[t-SNE] Computed conditional probabilities for sample 10000 / 55000
[t-SNE] Computed conditional probabilities for sample 11000 / 55000
[t-SNE] Computed conditional probabilities for sample 12000 / 55000
[t-SNE] Computed conditional probabilities for sample 13000 / 55000
[t-SNE] Computed conditional probabilities for sample 14000 / 55000
[t-SNE] Computed conditional probabilities for sample 15000 / 55000
[t-SNE] Computed conditional probabilities for sample 16000 / 55000
[t-SNE] Computed conditional probabilities for sample 17000 / 55000
[t-SNE] Computed conditional probabilities for sample 18000 / 55000
[t-SNE] Computed conditional probabilities for sample 19000 / 55000
[t-SNE] Computed conditional probabilities for sample 20000 / 55000
[t-SNE] Computed conditional probabilities for sample 21000 / 55000
[t-SNE] Computed conditional probabilities for sample 22000 / 55000
[t-SNE] Computed conditional probabilities for sample 23000 / 55000
[t-SNE] Computed conditional probabilities for sample 24000 / 55000
[t-SNE] Computed conditional probabilities for sample 25000 / 55000
[t-SNE] Computed conditional probabilities for sample 26000 / 55000
[t-SNE] Computed conditional probabilities for sample 27000 / 55000
[t-SNE] Computed conditional probabilities for sample 28000 / 55000
[t-SNE] Computed conditional probabilities for sample 29000 / 55000
[t-SNE] Computed conditional probabilities for sample 30000 / 55000
[t-SNE] Computed conditional probabilities for sample 31000 / 55000
[t-SNE] Computed conditional probabilities for sample 32000 / 55000
[t-SNE] Computed conditional probabilities for sample 33000 / 55000
[t-SNE] Computed conditional probabilities for sample 34000 / 55000
[t-SNE] Computed conditional probabilities for sample 35000 / 55000
[t-SNE] Computed conditional probabilities for sample 36000 / 55000
[t-SNE] Computed conditional probabilities for sample 37000 / 55000
[t-SNE] Computed conditional probabilities for sample 38000 / 55000
[t-SNE] Computed conditional probabilities for sample 39000 / 55000
[t-SNE] Computed conditional probabilities for sample 40000 / 55000
[t-SNE] Computed conditional probabilities for sample 41000 / 55000
[t-SNE] Computed conditional probabilities for sample 42000 / 55000
[t-SNE] Computed conditional probabilities for sample 43000 / 55000
[t-SNE] Computed conditional probabilities for sample 44000 / 55000
[t-SNE] Computed conditional probabilities for sample 45000 / 55000
[t-SNE] Computed conditional probabilities for sample 46000 / 55000
[t-SNE] Computed conditional probabilities for sample 47000 / 55000
[t-SNE] Computed conditional probabilities for sample 48000 / 55000
[t-SNE] Computed conditional probabilities for sample 49000 / 55000
[t-SNE] Computed conditional probabilities for sample 50000 / 55000
[t-SNE] Computed conditional probabilities for sample 51000 / 55000
[t-SNE] Computed conditional probabilities for sample 52000 / 55000
[t-SNE] Computed conditional probabilities for sample 53000 / 55000
[t-SNE] Computed conditional probabilities for sample 54000 / 55000
[t-SNE] Computed conditional probabilities for sample 55000 / 55000
[t-SNE] Mean sigma: 2.328261
[t-SNE] KL divergence after 250 iterations with early exaggeration: 88.776878
[t-SNE] KL divergence after 300 iterations: 3.853568
CPU times: user 3min 11s, sys: 8.89 s, total: 3min 20s
Wall time: 2min 26s
In [114]:
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
In [115]:
tsne_results
Out[115]:
array([[0.341, 0.373],
       [0.469, 0.732],
       [0.611, 0.156],
       ...,
       [0.684, 0.482],
       [0.729, 0.874],
       [0.226, 0.58 ]], dtype=float32)
In [116]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
plt.scatter(tsne_results[:,0],tsne_results[:,1], c=train_labels, s=10, cmap=cmap)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index].reshape(28,28), cmap="binary"),
            position, bboxprops={"edgecolor": cmap(train_labels[index]), "lw": 2})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()
No description has been provided for this image

Experiment 4¶

  • 28x28 images (784 pixels) reduced to 154 input nodes via PCA (n_components=0.95)
  • hidden layer: 85 nodes
  • output layer: 10 nodes
In [117]:
pca = PCA(n_components=0.95)
train_images_red = pca.fit_transform(train_images)
val_images_red = pca.transform(val_images)
test_images_red = pca.transform(test_images)
In [118]:
test_images_red.shape, train_images_red.shape, val_images_red.shape
Out[118]:
((10000, 154), (55000, 154), (5000, 154))
In [119]:
#k.clear_session()

model = Sequential([
    Dense(name = 'hidden_layer_1', units=85, activation='relu', input_shape=[154]),
    Dense(name = 'output_layer', units = 10, activation ='softmax')
])

Build the DNN model¶

In [120]:
model.summary()
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 hidden_layer_1 (Dense)      (None, 85)                13175     
                                                                 
 output_layer (Dense)        (None, 10)                860       
                                                                 
=================================================================
Total params: 14,035
Trainable params: 14,035
Non-trainable params: 0
_________________________________________________________________
In [121]:
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Out[121]:
No description has been provided for this image

Compile the DNN model¶

In [122]:
model.compile(optimizer='rmsprop',
               loss = 'sparse_categorical_crossentropy',
               metrics=['accuracy'])

Train the DNN model¶

In [123]:
history = model.fit(train_images_red
    , train_labels
    , epochs=15
    , validation_data=(val_images_red, val_labels)
    , callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_4_optimized.h5",save_best_only=True,save_weights_only=False)] 
    )
Epoch 1/15
1719/1719 [==============================] - 1s 353us/step - loss: 0.3471 - accuracy: 0.9046 - val_loss: 0.1584 - val_accuracy: 0.9550
Epoch 2/15
1719/1719 [==============================] - 1s 320us/step - loss: 0.1367 - accuracy: 0.9594 - val_loss: 0.1133 - val_accuracy: 0.9676
Epoch 3/15
1719/1719 [==============================] - 1s 328us/step - loss: 0.0962 - accuracy: 0.9717 - val_loss: 0.0981 - val_accuracy: 0.9718
Epoch 4/15
1719/1719 [==============================] - 1s 339us/step - loss: 0.0736 - accuracy: 0.9789 - val_loss: 0.0877 - val_accuracy: 0.9736
Epoch 5/15
1719/1719 [==============================] - 1s 340us/step - loss: 0.0597 - accuracy: 0.9831 - val_loss: 0.0834 - val_accuracy: 0.9756
Epoch 6/15
1719/1719 [==============================] - 1s 345us/step - loss: 0.0489 - accuracy: 0.9862 - val_loss: 0.0815 - val_accuracy: 0.9780
Epoch 7/15
1719/1719 [==============================] - 1s 346us/step - loss: 0.0406 - accuracy: 0.9890 - val_loss: 0.0811 - val_accuracy: 0.9784
Epoch 8/15
1719/1719 [==============================] - 1s 341us/step - loss: 0.0343 - accuracy: 0.9908 - val_loss: 0.0798 - val_accuracy: 0.9792
Epoch 9/15
1719/1719 [==============================] - 1s 344us/step - loss: 0.0289 - accuracy: 0.9926 - val_loss: 0.0800 - val_accuracy: 0.9788
Epoch 10/15
1719/1719 [==============================] - 1s 329us/step - loss: 0.0248 - accuracy: 0.9936 - val_loss: 0.0804 - val_accuracy: 0.9816
Epoch 11/15
1719/1719 [==============================] - 1s 321us/step - loss: 0.0210 - accuracy: 0.9950 - val_loss: 0.0821 - val_accuracy: 0.9798
Epoch 12/15
1719/1719 [==============================] - 1s 323us/step - loss: 0.0176 - accuracy: 0.9960 - val_loss: 0.0866 - val_accuracy: 0.9790
Epoch 13/15
1719/1719 [==============================] - 1s 323us/step - loss: 0.0153 - accuracy: 0.9966 - val_loss: 0.0862 - val_accuracy: 0.9800
Epoch 14/15
1719/1719 [==============================] - 1s 323us/step - loss: 0.0131 - accuracy: 0.9976 - val_loss: 0.0867 - val_accuracy: 0.9798
Epoch 15/15
1719/1719 [==============================] - 1s 324us/step - loss: 0.0111 - accuracy: 0.9978 - val_loss: 0.0940 - val_accuracy: 0.9794
In [124]:
model = tf.keras.models.load_model("exp_4_optimized.h5")

Evaluate the DNN model¶

In [125]:
test_loss, test_acc = model.evaluate(test_images_red, test_labels)
313/313 [==============================] - 0s 266us/step - loss: 0.0824 - accuracy: 0.9766
In [126]:
print(f'test acc: {test_acc}, test loss: {test_loss}')
test acc: 0.9765999913215637, test loss: 0.08242236077785492

Plot performance metrics¶

In [127]:
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
In [128]:
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
No description has been provided for this image

Making Predictions¶

In [129]:
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images_red) 
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 201us/step
In [130]:
print_validation_report(train_labels, pred_classes)
Classification Report
              precision    recall  f1-score   support

           0       0.99      1.00      1.00      5444
           1       1.00      1.00      1.00      6179
           2       0.99      0.99      0.99      5470
           3       1.00      0.99      0.99      5638
           4       1.00      1.00      1.00      5307
           5       1.00      1.00      1.00      4987
           6       1.00      1.00      1.00      5417
           7       0.99      1.00      0.99      5715
           8       0.99      0.99      0.99      5389
           9       1.00      0.99      0.99      5454

    accuracy                           0.99     55000
   macro avg       0.99      0.99      0.99     55000
weighted avg       0.99      0.99      0.99     55000

Accuracy Score: 0.9946181818181818
Root Mean Square Error: 0.33780037138684577

Create the confusion matrix¶

In [131]:
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
In [132]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
Out[132]:
  0 1 2 3 4 5 6 7 8 9
0 0.00% 0.00% 0.00% 8.19% 0.00% 0.00% 0.00% 91.81% 0.00% 0.00%
1 0.00% 0.00% 0.00% 99.98% 0.00% 0.00% 0.00% 0.00% 0.01% 0.01%
2 0.00% 0.00% 0.00% 0.00% 83.03% 0.00% 0.00% 0.05% 0.01% 16.92%
3 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 99.99% 0.00% 0.00% 0.00%
4 0.00% 99.95% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.05% 0.00%
5 0.00% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 99.99% 0.00%
6 0.00% 99.88% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.12% 0.00%
7 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
8 0.00% 0.00% 0.00% 0.00% 0.03% 0.01% 0.00% 0.02% 0.00% 99.94%
9 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%
10 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
11 0.00% 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
12 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
13 0.00% 0.00% 92.31% 7.65% 0.00% 0.03% 0.00% 0.01% 0.00% 0.00%
14 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% 0.00%
15 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
16 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
17 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.05% 0.01% 99.94%
18 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% 0.00% 0.00%
19 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%

Visualize the confusion matrix¶

In [133]:
mtx = plot_confusion_matrix(train_labels,pred_classes)
No description has been provided for this image
In [134]:
cl_a, cl_b = 9, 4
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]

plt.figure(figsize=(8,8))

p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)

plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);


p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")

# plt.savefig("error_analysis_digits_plot_EXP1_valid")

plt.show()
No description has been provided for this image

Get Activation Values of the Hidden Nodes (128)¶

In [135]:
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]
In [136]:
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
In [137]:
print(f"There are {len(layer_outputs)} layers")
layer_outputs; # description of the layers
There are 2 layers
In [138]:
# Get the outputs of all the hidden nodes for each of the 60000 training images
activations = activation_model.predict(train_images_red)
hidden_layer_activation = activations[0]
output_layer_activations = activations[1]
hidden_layer_activation.shape   #  each of the 85 hidden nodes has one activation value per training image
1719/1719 [==============================] - 0s 254us/step
Out[138]:
(55000, 85)
In [139]:
output_layer_activations.shape
Out[139]:
(55000, 10)
In [140]:
print(f"The maximum activation value of the hidden nodes in the hidden layer is \
{hidden_layer_activation.max()}")
The maximum activation value of the hidden nodes in the hidden layer is 9.398319244384766
In [141]:
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True)  # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10)
The output for the first image are [0.    0.    0.    0.082 0.    0.    0.    0.918 0.    0.   ]
The sum of the probabilities is (approximately) 1.0

Create a dataframe with the activation values and the class labels¶

In [142]:
#Get the dataframe of all the node values
activation_data = {'actual_class':train_labels}
for k in range(0,85): 
    activation_data[f"act_val_{k}"] = hidden_layer_activation[:,k]

activation_df = pd.DataFrame(activation_data)
activation_df.head(15).round(3).T
Out[142]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
actual_class 7.000 3.000 4.000 6.000 1.000 8.000 1.000 0.000 9.000 8.000 0.000 3.000 1.000 2.000 7.000
act_val_0 0.000 0.629 0.000 0.000 0.000 0.422 1.384 0.212 0.000 0.000 0.000 0.000 0.351 2.546 1.246
act_val_1 0.213 1.582 0.919 0.145 0.871 1.471 1.214 0.000 0.758 1.866 0.000 2.395 1.259 0.000 1.608
act_val_2 0.877 1.153 0.000 0.178 0.000 0.664 0.295 0.000 0.000 0.000 0.254 0.594 0.000 2.288 1.363
act_val_3 0.000 0.000 1.905 0.995 0.000 0.000 0.000 0.996 0.683 1.919 1.783 0.292 0.000 0.000 0.086
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
act_val_80 0.000 0.000 0.000 0.000 0.000 1.765 0.000 2.322 0.000 0.382 2.768 0.511 0.000 2.191 0.000
act_val_81 2.018 3.580 1.706 0.000 0.948 0.000 0.988 0.000 2.274 0.745 0.437 1.219 0.976 0.879 1.930
act_val_82 0.000 0.000 0.073 0.000 1.834 0.000 2.021 0.000 0.000 0.000 0.000 0.000 1.470 0.000 2.294
act_val_83 0.000 0.000 0.000 1.118 4.013 0.000 3.589 0.000 0.000 0.000 0.000 0.000 3.767 0.000 0.997
act_val_84 0.000 0.000 2.170 1.310 0.000 0.000 0.000 1.618 1.125 1.628 0.814 0.696 0.000 0.238 0.000

86 rows × 15 columns

Visualize the activation values with boxplots¶

In [143]:
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
bplot = sns.boxplot(y='act_val_0', x='actual_class', 
                 data=activation_df[['act_val_0','actual_class']], 
                 width=0.5,
                 palette="colorblind")
No description has been provided for this image

Displaying The Range Of Activation Values For Each Class Labels¶

In [144]:
activation_df.groupby("actual_class")["act_val_0"].apply(lambda x: [round(min(x.tolist()),2),
 round(max(x.tolist()),2)]).reset_index().rename(columns={"act_val_0": "range_of_act_values"})
Out[144]:
actual_class range_of_act_values
0 0 [0.0, 5.56]
1 1 [0.0, 5.25]
2 2 [0.0, 5.23]
3 3 [0.0, 6.06]
4 4 [0.0, 3.04]
5 5 [0.0, 8.19]
6 6 [0.0, 6.19]
7 7 [0.0, 4.49]
8 8 [0.0, 5.89]
9 9 [0.0, 4.02]

Get Activation Values of the Pixel Values (784)¶

Create a dataframe with the pixel values and class labels¶

In [145]:
#Get the dataframe of all the pixel values
pixel_data = {'actual_class':train_labels}
for k in range(0,154): 
    pixel_data[f"pix_val_{k}"] = train_images_red[:,k]
pixel_df = pd.DataFrame(pixel_data)
pixel_df.head(15).round(3).T
Out[145]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
actual_class 7.000 3.000 4.000 6.000 1.000 8.000 1.000 0.000 9.000 8.000 0.000 3.000 1.000 2.000 7.000
pix_val_0 0.725 0.473 -0.094 0.221 -3.679 1.303 -3.645 6.441 -0.511 1.735 5.038 0.703 -3.438 3.187 -1.907
pix_val_1 -2.433 1.005 -3.010 -0.725 2.086 0.938 2.637 0.618 -2.159 2.033 1.001 2.985 0.737 1.372 -1.918
pix_val_2 1.537 0.502 2.129 -2.279 -0.551 -1.222 -0.458 -1.207 2.395 0.174 1.917 1.507 0.137 0.066 0.498
pix_val_3 -2.445 3.738 0.838 -1.903 -0.906 2.802 -0.120 0.589 -0.949 1.990 -0.634 1.158 -0.161 2.586 -0.127
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
pix_val_149 0.242 0.161 0.033 -0.102 -0.082 -0.114 0.094 0.166 -0.262 -0.170 0.033 -0.044 -0.011 -0.184 -0.051
pix_val_150 -0.115 -0.105 -0.006 0.040 -0.061 0.050 -0.130 -0.191 -0.010 -0.439 0.139 0.018 0.115 0.028 -0.050
pix_val_151 -0.367 -0.218 0.013 0.048 -0.117 -0.117 0.116 -0.141 0.022 -0.228 -0.135 0.150 0.002 0.108 -0.111
pix_val_152 -0.063 0.148 0.125 -0.071 0.007 0.083 -0.046 0.120 0.017 0.283 -0.077 0.002 0.080 0.006 0.256
pix_val_153 -0.264 0.007 -0.035 0.028 0.021 -0.087 0.055 0.071 0.183 0.041 -0.132 0.142 -0.173 0.105 0.268

155 rows × 15 columns

In [146]:
pixel_df.pix_val_77.value_counts()
Out[146]:
pix_val_77
 0.233526    2
 0.222589    2
-0.106621    2
 0.335311    2
 0.372639    2
            ..
-0.097114    1
-0.040634    1
 0.232502    1
 0.125765    1
 0.271186    1
Name: count, Length: 54958, dtype: int64
In [147]:
pixel_df.pix_val_78.value_counts()
Out[147]:
pix_val_78
 0.301367    2
-0.072439    2
 0.184436    2
 0.491759    2
-0.225153    2
            ..
-0.054373    1
-0.579701    1
 0.121835    1
 0.007295    1
-0.239521    1
Name: count, Length: 54980, dtype: int64

Use a scatter plot to visualize the predicive power of the pixel values at two fixed locations in the image, i.e. how well the pixel values at two fixed locations in the image "predict" the class labels.¶

We use a scatter plot to determine the correlation between the pix_val_77 and pix_val_78 values and the actual_class values.

In [148]:
plt.figure(figsize=(16, 10))
color = sns.color_palette("hls", 10)
sns.scatterplot(x="pix_val_77", y="pix_val_78", hue="actual_class",  palette=color, data = pixel_df, legend="full")
plt.legend(loc='upper left');
No description has been provided for this image

PCA Feature Reduction / Model Optimization¶

Use PCA decomposition to reduce the number of features from 784 features to 2 features¶

In [149]:
# Separating out the features
features = [*pixel_data][1:] # ['pix_val_0', 'pix_val_1',...]
x = pixel_df.loc[:, features].values 

pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])
In [150]:
pixel_pca_df = pd.concat([principalDf, pixel_df[['actual_class']]], axis = 1)
In [151]:
pixel_pca_df.head().round(3)
Out[151]:
principal component 1 principal component 2 actual_class
0 0.725 -2.433 7
1 0.473 1.005 3
2 -0.094 -3.010 4
3 0.221 -0.725 6
4 -3.680 2.086 1
In [152]:
pca.explained_variance_ratio_
Out[152]:
array([0.102, 0.074], dtype=float32)

Use a scatter plot to visualize the predictive power of the two principal component values.¶

In [153]:
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="principal component 1", y="principal component 2",
    hue="actual_class",
    palette=sns.color_palette("hls", 10),
    data=pixel_pca_df,
    legend="full",
    alpha=0.3
);
No description has been provided for this image

Use PCA decomposition to reduce the (activation) features from 128 (= num of hidden nodes) to 2¶

In [154]:
# Separating out the features
features = [*activation_data][1:] # ['act_val_0', 'act_val_1',...]
x = activation_df.loc[:, features].values 

pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])
principalDf.head().round(3)
Out[154]:
principal component 1 principal component 2
0 -1.197 2.767
1 4.828 -3.830
2 -0.279 2.659
3 -1.828 1.836
4 3.974 1.436
In [155]:
activation_pca_df = pd.concat([principalDf, activation_df[['actual_class']]], axis = 1)
activation_pca_df.head().round(3)
Out[155]:
principal component 1 principal component 2 actual_class
0 -1.197 2.767 7
1 4.828 -3.830 3
2 -0.279 2.659 4
3 -1.828 1.836 6
4 3.974 1.436 1
In [156]:
ev=pca.explained_variance_ratio_
ev
Out[156]:
array([0.108, 0.087], dtype=float32)
In [157]:
print(f'The {len(ev)} principal components summed together {ev[0]:.3f} + {ev[1]:.3f} = {sum(ev):.3f} explained variance')
The 2 principal components summed together 0.108 + 0.087 = 0.196 explained variance
In [158]:
plt.figure(figsize=(16,10))
sns.scatterplot(
    x="principal component 1", y="principal component 2",
    hue="actual_class",
    palette=sns.color_palette("hls", 10),
    data=activation_pca_df,
    legend="full",
    alpha=0.3
);
No description has been provided for this image
In [159]:
# Separating out the features
features = [*activation_data][1:] # ['act_val_0', 'act_val_1',...]
x = activation_df.loc[:, features].values 

pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['pca-one', 'pca-two', 'pca-three'])
principalDf.head(10).round(3).T
Out[159]:
0 1 2 3 4 5 6 7 8 9
pca-one -1.197 4.828 -0.279 -1.828 3.974 1.828 4.959 -7.113 -0.993 2.806
pca-two 2.767 -3.830 2.659 1.836 1.436 -2.776 1.040 -3.955 1.950 -4.261
pca-three 1.414 1.610 2.149 -1.709 -1.596 -3.101 -1.840 -1.486 3.595 -0.750
In [160]:
ev=pca.explained_variance_ratio_
ev
Out[160]:
array([0.108, 0.087, 0.077], dtype=float32)
In [161]:
print(f'The {len(ev)} principal components summed together {ev[0]:.3f} + {ev[1]:.3f} = {sum(ev):.3f} explained variance')
The 3 principal components summed together 0.108 + 0.087 = 0.273 explained variance
In [162]:
activation_pca_df = pd.concat([principalDf, activation_df[['actual_class']]], axis = 1)
activation_pca_df.head().round(3)
Out[162]:
pca-one pca-two pca-three actual_class
0 -1.197 2.767 1.414 7
1 4.828 -3.830 1.610 3
2 -0.279 2.659 2.149 4
3 -1.828 1.836 -1.709 6
4 3.974 1.436 -1.596 1

Use t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the (activation) features from 128 (= num of hidden nodes) to 2¶

In [163]:
activation_df.shape
Out[163]:
(55000, 86)
In [164]:
N=55000
activation_df_subset = activation_df.iloc[:N].copy()
activation_df_subset.shape
Out[164]:
(55000, 86)
In [165]:
data_subset = activation_df_subset[features].values
data_subset.shape
Out[165]:
(55000, 85)
In [166]:
%%time
tsne = TSNE(n_components=2 # sorts nodes in ascending order,
# tsne = TSNE(n_components=.95 # get nodes to explain 95% of the variance
            ,init='pca'
            ,learning_rate='auto'
            ,verbose=1
            ,perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(data_subset)
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 55000 samples in 0.010s...
[t-SNE] Computed neighbors for 55000 samples in 3.804s...
[t-SNE] Computed conditional probabilities for sample 1000 / 55000
[t-SNE] Computed conditional probabilities for sample 2000 / 55000
[t-SNE] Computed conditional probabilities for sample 3000 / 55000
[t-SNE] Computed conditional probabilities for sample 4000 / 55000
[t-SNE] Computed conditional probabilities for sample 5000 / 55000
[t-SNE] Computed conditional probabilities for sample 6000 / 55000
[t-SNE] Computed conditional probabilities for sample 7000 / 55000
[t-SNE] Computed conditional probabilities for sample 8000 / 55000
[t-SNE] Computed conditional probabilities for sample 9000 / 55000
[t-SNE] Computed conditional probabilities for sample 10000 / 55000
[t-SNE] Computed conditional probabilities for sample 11000 / 55000
[t-SNE] Computed conditional probabilities for sample 12000 / 55000
[t-SNE] Computed conditional probabilities for sample 13000 / 55000
[t-SNE] Computed conditional probabilities for sample 14000 / 55000
[t-SNE] Computed conditional probabilities for sample 15000 / 55000
[t-SNE] Computed conditional probabilities for sample 16000 / 55000
[t-SNE] Computed conditional probabilities for sample 17000 / 55000
[t-SNE] Computed conditional probabilities for sample 18000 / 55000
[t-SNE] Computed conditional probabilities for sample 19000 / 55000
[t-SNE] Computed conditional probabilities for sample 20000 / 55000
[t-SNE] Computed conditional probabilities for sample 21000 / 55000
[t-SNE] Computed conditional probabilities for sample 22000 / 55000
[t-SNE] Computed conditional probabilities for sample 23000 / 55000
[t-SNE] Computed conditional probabilities for sample 24000 / 55000
[t-SNE] Computed conditional probabilities for sample 25000 / 55000
[t-SNE] Computed conditional probabilities for sample 26000 / 55000
[t-SNE] Computed conditional probabilities for sample 27000 / 55000
[t-SNE] Computed conditional probabilities for sample 28000 / 55000
[t-SNE] Computed conditional probabilities for sample 29000 / 55000
[t-SNE] Computed conditional probabilities for sample 30000 / 55000
[t-SNE] Computed conditional probabilities for sample 31000 / 55000
[t-SNE] Computed conditional probabilities for sample 32000 / 55000
[t-SNE] Computed conditional probabilities for sample 33000 / 55000
[t-SNE] Computed conditional probabilities for sample 34000 / 55000
[t-SNE] Computed conditional probabilities for sample 35000 / 55000
[t-SNE] Computed conditional probabilities for sample 36000 / 55000
[t-SNE] Computed conditional probabilities for sample 37000 / 55000
[t-SNE] Computed conditional probabilities for sample 38000 / 55000
[t-SNE] Computed conditional probabilities for sample 39000 / 55000
[t-SNE] Computed conditional probabilities for sample 40000 / 55000
[t-SNE] Computed conditional probabilities for sample 41000 / 55000
[t-SNE] Computed conditional probabilities for sample 42000 / 55000
[t-SNE] Computed conditional probabilities for sample 43000 / 55000
[t-SNE] Computed conditional probabilities for sample 44000 / 55000
[t-SNE] Computed conditional probabilities for sample 45000 / 55000
[t-SNE] Computed conditional probabilities for sample 46000 / 55000
[t-SNE] Computed conditional probabilities for sample 47000 / 55000
[t-SNE] Computed conditional probabilities for sample 48000 / 55000
[t-SNE] Computed conditional probabilities for sample 49000 / 55000
[t-SNE] Computed conditional probabilities for sample 50000 / 55000
[t-SNE] Computed conditional probabilities for sample 51000 / 55000
[t-SNE] Computed conditional probabilities for sample 52000 / 55000
[t-SNE] Computed conditional probabilities for sample 53000 / 55000
[t-SNE] Computed conditional probabilities for sample 54000 / 55000
[t-SNE] Computed conditional probabilities for sample 55000 / 55000
[t-SNE] Mean sigma: 1.889768
[t-SNE] KL divergence after 250 iterations with early exaggeration: 90.279701
[t-SNE] KL divergence after 300 iterations: 3.914978
CPU times: user 3min 2s, sys: 7.97 s, total: 3min 10s
Wall time: 2min 29s
In [167]:
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
In [168]:
tsne_results
Out[168]:
array([[0.67 , 0.902],
       [0.711, 0.163],
       [0.305, 0.746],
       ...,
       [0.404, 0.374],
       [0.152, 0.42 ],
       [0.617, 0.562]], dtype=float32)
In [169]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
plt.scatter(tsne_results[:,0],tsne_results[:,1], c=train_labels, s=10, cmap=cmap)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index].reshape(28,28), cmap="binary"),
            position, bboxprops={"edgecolor": cmap(train_labels[index]), "lw": 2})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()
No description has been provided for this image
In [ ]:
 

Experiment 5¶

  • 28x28 images (784 pixels) dimensionality reduction via random forests
  • hidden layer: 85 nodes
  • output layer: 10 nodes

Reducing dimensionality of the data with Random Forests.¶

We create a Random Forest Classifier (with the default 100 trees) and use it to find the relative importance of the 784 features (pixels) in the training set. We produce a heat map to visual the relative importance of the features (using code from Hands On Machine Learning by A. Geron). Finally, we select the 70 most important feature (pixels) from the training, validation and test images to test our 'best' model on.

In [170]:
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=42)
rnd_clf.fit(train_images,train_labels)
Out[170]:
RandomForestClassifier(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(random_state=42)
In [171]:
plt.figure(figsize = (12, 8))
plot_digit(rnd_clf.feature_importances_)
cbar = plt.colorbar(ticks=[rnd_clf.feature_importances_.min(), rnd_clf.feature_importances_.max()])
cbar.ax.set_yticklabels(['Not important', 'Very important'])
plt.show()
No description has been provided for this image
In [172]:
n = 70
imp_arr = rnd_clf.feature_importances_
idx = (-imp_arr).argsort()[:n]          # get the indices of the 70 "most important" features/pixels
len(idx)
Out[172]:
70

Create Training and Test Examples Leveraging 70 Pixels¶

In [173]:
# Create training, validation and test images using just the 70 pixel locations obtained above
train_images_sm = train_images[:,idx]
val_images_sm = val_images[:,idx]
test_images_sm = test_images[:,idx]
train_images_sm.shape, val_images.shape, test_images_sm.shape # the reduced images have dimension 70
Out[173]:
((55000, 70), (5000, 784), (10000, 70))
In [174]:
# to convert an index n, 0<= n < 784
def pair(n,size):
    x = n//size 
    y = n%size
    return x,y
In [175]:
plt.figure(figsize = (12, 8))
plt.imshow(train_images[1].reshape(28,28),cmap='binary')
x, y = np.array([pair(k,28) for k in idx]).T
plt.scatter(x,y,color='red',s=20)
Out[175]:
<matplotlib.collections.PathCollection at 0x31d99cb10>
No description has been provided for this image
In [176]:
model = Sequential([
    Dense(name = 'hidden_layer_1', units=85, activation='relu', input_shape=(70,)),
    Dense(name = 'output_layer', units = 10, activation ='softmax')
])

Build the DNN model¶

In [177]:
model.summary() # prints a summary representation of the odel
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 hidden_layer_1 (Dense)      (None, 85)                6035      
                                                                 
 output_layer (Dense)        (None, 10)                860       
                                                                 
=================================================================
Total params: 6,895
Trainable params: 6,895
Non-trainable params: 0
_________________________________________________________________
In [178]:
keras.utils.plot_model(model, "mnist_model.png", show_shapes=True)
Out[178]:
No description has been provided for this image

Compile the DNN model¶

In [179]:
model.compile(optimizer='rmsprop',
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

Train the DNN model¶

In [180]:
history = model.fit(train_images_sm
    , train_labels
    , epochs=30
    , validation_data=(val_images_sm, val_labels)
    , callbacks=[tf.keras.callbacks.ModelCheckpoint("exp_5_optimized.h5",save_best_only=True,save_weights_only=False)] 
    )
Epoch 1/30
1719/1719 [==============================] - 1s 338us/step - loss: 0.6280 - accuracy: 0.8141 - val_loss: 0.4373 - val_accuracy: 0.8750
Epoch 2/30
1719/1719 [==============================] - 1s 304us/step - loss: 0.4143 - accuracy: 0.8756 - val_loss: 0.3550 - val_accuracy: 0.9002
Epoch 3/30
1719/1719 [==============================] - 1s 304us/step - loss: 0.3448 - accuracy: 0.8967 - val_loss: 0.3083 - val_accuracy: 0.9120
Epoch 4/30
1719/1719 [==============================] - 1s 304us/step - loss: 0.3028 - accuracy: 0.9095 - val_loss: 0.2741 - val_accuracy: 0.9190
Epoch 5/30
1719/1719 [==============================] - 1s 305us/step - loss: 0.2751 - accuracy: 0.9175 - val_loss: 0.2630 - val_accuracy: 0.9230
Epoch 6/30
1719/1719 [==============================] - 1s 305us/step - loss: 0.2564 - accuracy: 0.9228 - val_loss: 0.2507 - val_accuracy: 0.9266
Epoch 7/30
1719/1719 [==============================] - 1s 304us/step - loss: 0.2423 - accuracy: 0.9275 - val_loss: 0.2387 - val_accuracy: 0.9326
Epoch 8/30
1719/1719 [==============================] - 1s 302us/step - loss: 0.2318 - accuracy: 0.9306 - val_loss: 0.2432 - val_accuracy: 0.9320
Epoch 9/30
1719/1719 [==============================] - 1s 306us/step - loss: 0.2236 - accuracy: 0.9323 - val_loss: 0.2301 - val_accuracy: 0.9332
Epoch 10/30
1719/1719 [==============================] - 1s 302us/step - loss: 0.2162 - accuracy: 0.9349 - val_loss: 0.2317 - val_accuracy: 0.9334
Epoch 11/30
1719/1719 [==============================] - 1s 306us/step - loss: 0.2093 - accuracy: 0.9374 - val_loss: 0.2198 - val_accuracy: 0.9376
Epoch 12/30
1719/1719 [==============================] - 1s 305us/step - loss: 0.2043 - accuracy: 0.9395 - val_loss: 0.2304 - val_accuracy: 0.9358
Epoch 13/30
1719/1719 [==============================] - 1s 305us/step - loss: 0.1989 - accuracy: 0.9409 - val_loss: 0.2209 - val_accuracy: 0.9380
Epoch 14/30
1719/1719 [==============================] - 1s 305us/step - loss: 0.1953 - accuracy: 0.9422 - val_loss: 0.2250 - val_accuracy: 0.9410
Epoch 15/30
1719/1719 [==============================] - 1s 304us/step - loss: 0.1919 - accuracy: 0.9426 - val_loss: 0.2255 - val_accuracy: 0.9358
Epoch 16/30
1719/1719 [==============================] - 1s 304us/step - loss: 0.1880 - accuracy: 0.9433 - val_loss: 0.2329 - val_accuracy: 0.9328
Epoch 17/30
1719/1719 [==============================] - 1s 302us/step - loss: 0.1850 - accuracy: 0.9452 - val_loss: 0.2215 - val_accuracy: 0.9410
Epoch 18/30
1719/1719 [==============================] - 1s 301us/step - loss: 0.1815 - accuracy: 0.9462 - val_loss: 0.2220 - val_accuracy: 0.9382
Epoch 19/30
1719/1719 [==============================] - 1s 301us/step - loss: 0.1798 - accuracy: 0.9469 - val_loss: 0.2201 - val_accuracy: 0.9384
Epoch 20/30
1719/1719 [==============================] - 1s 302us/step - loss: 0.1761 - accuracy: 0.9480 - val_loss: 0.2210 - val_accuracy: 0.9390
Epoch 21/30
1719/1719 [==============================] - 1s 303us/step - loss: 0.1750 - accuracy: 0.9487 - val_loss: 0.2206 - val_accuracy: 0.9394
Epoch 22/30
1719/1719 [==============================] - 1s 315us/step - loss: 0.1726 - accuracy: 0.9488 - val_loss: 0.2350 - val_accuracy: 0.9340
Epoch 23/30
1719/1719 [==============================] - 1s 308us/step - loss: 0.1707 - accuracy: 0.9497 - val_loss: 0.2178 - val_accuracy: 0.9416
Epoch 24/30
1719/1719 [==============================] - 1s 308us/step - loss: 0.1692 - accuracy: 0.9497 - val_loss: 0.2142 - val_accuracy: 0.9424
Epoch 25/30
1719/1719 [==============================] - 1s 305us/step - loss: 0.1673 - accuracy: 0.9497 - val_loss: 0.2277 - val_accuracy: 0.9388
Epoch 26/30
1719/1719 [==============================] - 1s 306us/step - loss: 0.1657 - accuracy: 0.9512 - val_loss: 0.2218 - val_accuracy: 0.9364
Epoch 27/30
1719/1719 [==============================] - 1s 306us/step - loss: 0.1646 - accuracy: 0.9516 - val_loss: 0.2252 - val_accuracy: 0.9392
Epoch 28/30
1719/1719 [==============================] - 1s 305us/step - loss: 0.1632 - accuracy: 0.9520 - val_loss: 0.2228 - val_accuracy: 0.9436
Epoch 29/30
1719/1719 [==============================] - 1s 306us/step - loss: 0.1620 - accuracy: 0.9520 - val_loss: 0.2253 - val_accuracy: 0.9370
Epoch 30/30
1719/1719 [==============================] - 1s 306us/step - loss: 0.1610 - accuracy: 0.9524 - val_loss: 0.2260 - val_accuracy: 0.9422

Evaluate the DNN model¶

In [181]:
model = tf.keras.models.load_model("exp_5_optimized.h5")
test_loss, test_acc = model.evaluate(test_images_sm, test_labels)
313/313 [==============================] - 0s 260us/step - loss: 0.2242 - accuracy: 0.9368
In [182]:
print(f'test acc: {test_acc}, test loss: {test_loss}')
test acc: 0.9368000030517578, test loss: 0.22416523098945618

Reviewing Performance¶

In [183]:
history_dict = history.history
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
In [184]:
plt.subplots(figsize=(16,12))
plt.tight_layout()
display_training_curves(history_dict['accuracy'], history_dict['val_accuracy'], 'accuracy', 211)
display_training_curves(history_dict['loss'], history_dict['val_loss'], 'loss', 212)
No description has been provided for this image

Making Predictions¶

In [185]:
# Get the predicted classes:
# model.predict_classes() is depricated in TensorFlow 2.7+
# pred_classes = model.predict(train_images)
# alternate method:
pred_train=model.predict(train_images_sm) 
pred_classes=np.argmax(pred_train, axis=1)
1719/1719 [==============================] - 0s 193us/step
In [186]:
print_validation_report(train_labels, pred_classes)
Classification Report
              precision    recall  f1-score   support

           0       0.97      0.97      0.97      5444
           1       0.98      0.98      0.98      6179
           2       0.93      0.94      0.94      5470
           3       0.95      0.95      0.95      5638
           4       0.95      0.96      0.95      5307
           5       0.96      0.92      0.94      4987
           6       0.97      0.97      0.97      5417
           7       0.96      0.96      0.96      5715
           8       0.95      0.94      0.95      5389
           9       0.93      0.94      0.93      5454

    accuracy                           0.95     55000
   macro avg       0.95      0.95      0.95     55000
weighted avg       0.95      0.95      0.95     55000

Accuracy Score: 0.9544181818181818
Root Mean Square Error: 0.8948234970702831

Create the confusion matrix¶

In [187]:
conf_mx = tf.math.confusion_matrix(train_labels, pred_classes)
conf_mx;
In [188]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
df = pd.DataFrame(pred_train[0:20], columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
df.style.format("{:.2%}").background_gradient(cmap=cm)
Out[188]:
  0 1 2 3 4 5 6 7 8 9
0 0.00% 0.00% 39.53% 1.81% 0.02% 0.00% 0.00% 50.60% 0.01% 8.04%
1 0.00% 0.00% 0.00% 99.65% 0.00% 0.00% 0.00% 0.00% 0.01% 0.34%
2 0.00% 0.00% 0.00% 0.00% 4.25% 0.00% 0.00% 0.04% 0.00% 95.71%
3 0.00% 0.00% 0.00% 0.00% 0.12% 0.00% 99.87% 0.00% 0.00% 0.00%
4 0.00% 99.91% 0.02% 0.01% 0.00% 0.02% 0.00% 0.00% 0.04% 0.00%
5 0.00% 0.00% 0.00% 0.00% 0.02% 0.00% 0.00% 0.00% 99.97% 0.00%
6 0.00% 99.90% 0.03% 0.00% 0.00% 0.00% 0.00% 0.02% 0.05% 0.00%
7 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
8 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 99.99%
9 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.99% 0.00%
10 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
11 0.00% 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
12 0.00% 99.77% 0.09% 0.01% 0.00% 0.01% 0.00% 0.00% 0.13% 0.00%
13 60.69% 0.00% 22.06% 10.27% 0.00% 6.44% 0.00% 0.00% 0.05% 0.50%
14 0.00% 0.00% 0.01% 0.01% 0.01% 0.00% 0.00% 99.91% 0.00% 0.07%
15 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
16 0.00% 0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
17 0.00% 0.00% 1.93% 1.68% 0.00% 0.00% 0.00% 3.45% 0.00% 92.93%
18 0.01% 0.00% 0.05% 0.00% 0.01% 0.00% 99.92% 0.00% 0.02% 0.00%
19 96.07% 0.00% 3.92% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01%

Visualize the confusion matrix¶

In [189]:
mtx = plot_confusion_matrix(train_labels,pred_classes)
No description has been provided for this image
In [190]:
cl_a, cl_b = 5, 3
X_aa = train_images[(train_labels == cl_a) & (pred_classes == cl_a)]
X_ab = train_images[(train_labels == cl_a) & (pred_classes == cl_b)]
X_ba = train_images[(train_labels == cl_b) & (pred_classes == cl_a)]
X_bb = train_images[(train_labels == cl_b) & (pred_classes == cl_b)]

plt.figure(figsize=(8,8))

p1 = plt.subplot(221)
p2 = plt.subplot(222)
p3 = plt.subplot(223)
p4 = plt.subplot(224)

plot_digits(X_aa[:25], p1, images_per_row=5);
plot_digits(X_ab[:25], p2, images_per_row=5);
plot_digits(X_ba[:25], p3, images_per_row=5);
plot_digits(X_bb[:25], p4, images_per_row=5);


p1.set_title(f"{cl_a}'s classified as {cl_a}'s")
p2.set_title(f"{cl_a}'s classified as {cl_b}'s")
p3.set_title(f"{cl_b}'s classified as {cl_a}'s")
p4.set_title(f"{cl_b}'s classified as {cl_b}'s")

# plt.savefig("error_analysis_digits_plot_EXP1_valid")

plt.show()
No description has been provided for this image
In [ ]:
 

ACTIVATION EXTRACTION & SCATTERPLOT¶

  • RF to 70 elements not as good compared to 128 NN and PCA
  • accuracy after 30 epochs not as good
In [191]:
# Extracts the outputs of the 2 layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

print(f"There are {len(layer_outputs)} layers")
layer_outputs # description of the layers
There are 2 layers
Out[191]:
[<KerasTensor: shape=(None, 85) dtype=float32 (created by layer 'hidden_layer_1')>,
 <KerasTensor: shape=(None, 10) dtype=float32 (created by layer 'output_layer')>]
In [192]:
# Get the outputs of all the hidden nodes for each of the 60000 training images
activations = activation_model.predict(train_images_sm)
hidden_layer_activation = activations[0]
output_layer_activations = activations[1]
hidden_layer_activation.shape   #  each of the 85 hidden nodes has one activation value per training image
1719/1719 [==============================] - 0s 227us/step
Out[192]:
(55000, 85)
In [193]:
output_layer_activations.shape
Out[193]:
(55000, 10)
In [194]:
print(f"The maximum activation value of the hidden nodes in the hidden layer is \
{hidden_layer_activation.max()}")
The maximum activation value of the hidden nodes in the hidden layer is 8.1651611328125
In [195]:
# Some stats about the output layer as an aside...
np.set_printoptions(suppress = True)  # display probabilities as decimals and NOT in scientific notation
ouput_layer_activation = activations[1]
print(f"The output node has shape {ouput_layer_activation.shape}")
print(f"The output for the first image are {ouput_layer_activation[0].round(4)}")
print(f"The sum of the probabilities is (approximately) {ouput_layer_activation[0].sum()}")
The output node has shape (55000, 10)
The output for the first image are [0.    0.    0.395 0.018 0.    0.    0.    0.506 0.    0.08 ]
The sum of the probabilities is (approximately) 1.0
In [196]:
# To see how closely the hidden node activation values correlate with the class labels
# Let us use seaborn for the boxplots this time.
plt.figure(figsize=(16,10))
bplot = sns.boxplot(y='act_val_0', x='actual_class', 
                 data=activation_df[['act_val_0','actual_class']], 
                 width=0.5,
                 palette="colorblind")
No description has been provided for this image
In [197]:
activation_df.groupby("actual_class")["act_val_0"].apply(lambda x: [round(min(x.tolist()),2),
 round(max(x.tolist()),2)]).reset_index().rename(columns={"act_val_0": "range_of_act_values"})
Out[197]:
actual_class range_of_act_values
0 0 [0.0, 5.56]
1 1 [0.0, 5.25]
2 2 [0.0, 5.23]
3 3 [0.0, 6.06]
4 4 [0.0, 3.04]
5 5 [0.0, 8.19]
6 6 [0.0, 6.19]
7 7 [0.0, 4.49]
8 8 [0.0, 5.89]
9 9 [0.0, 4.02]
In [198]:
#Get the dataframe of all the pixel values
pixel_data = {'actual_class':train_labels}
for k in range(0,154): 
    pixel_data[f"pix_val_{k}"] = train_images_red[:,k]
pixel_df = pd.DataFrame(pixel_data)
pixel_df.head(15).round(3).T
Out[198]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
actual_class 7.000 3.000 4.000 6.000 1.000 8.000 1.000 0.000 9.000 8.000 0.000 3.000 1.000 2.000 7.000
pix_val_0 0.725 0.473 -0.094 0.221 -3.679 1.303 -3.645 6.441 -0.511 1.735 5.038 0.703 -3.438 3.187 -1.907
pix_val_1 -2.433 1.005 -3.010 -0.725 2.086 0.938 2.637 0.618 -2.159 2.033 1.001 2.985 0.737 1.372 -1.918
pix_val_2 1.537 0.502 2.129 -2.279 -0.551 -1.222 -0.458 -1.207 2.395 0.174 1.917 1.507 0.137 0.066 0.498
pix_val_3 -2.445 3.738 0.838 -1.903 -0.906 2.802 -0.120 0.589 -0.949 1.990 -0.634 1.158 -0.161 2.586 -0.127
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
pix_val_149 0.242 0.161 0.033 -0.102 -0.082 -0.114 0.094 0.166 -0.262 -0.170 0.033 -0.044 -0.011 -0.184 -0.051
pix_val_150 -0.115 -0.105 -0.006 0.040 -0.061 0.050 -0.130 -0.191 -0.010 -0.439 0.139 0.018 0.115 0.028 -0.050
pix_val_151 -0.367 -0.218 0.013 0.048 -0.117 -0.117 0.116 -0.141 0.022 -0.228 -0.135 0.150 0.002 0.108 -0.111
pix_val_152 -0.063 0.148 0.125 -0.071 0.007 0.083 -0.046 0.120 0.017 0.283 -0.077 0.002 0.080 0.006 0.256
pix_val_153 -0.264 0.007 -0.035 0.028 0.021 -0.087 0.055 0.071 0.183 0.041 -0.132 0.142 -0.173 0.105 0.268

155 rows × 15 columns

In [199]:
plt.figure(figsize=(16, 10))
color = sns.color_palette("hls", 10)
sns.scatterplot(x="pix_val_77", y="pix_val_78", hue="actual_class",  palette=color, data = pixel_df, legend="full")
plt.legend(loc='upper left');
No description has been provided for this image
In [200]:
# Time Stamp
current_time = datetime.datetime.now()
formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S")

# Print the formatted time
print("Last Run:", formatted_time)
Last Run: 2024-10-07 00:05:05
In [ ]: